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Abstract 

With the growing amount of personal information exchanged over the Inter- 
net, privacy is becoming more and more a concern for users. In particular, per- 
sonal information is increasingly being exchanged in Identity Management (IdM) 
systems to satisfy the increasing need for reliable on-line identification and au- 
thentication. One of the key principles in protecting privacy is data minimization. 
This principle states that only the minimum amount of information necessary to 
accomplish a certain goal should be collected. Several "privacy-enhancing" IdM 
systems have been proposed to guarantee data minimization. However, currently 
there is no satisfactory way to assess and compare the privacy they offer in a pre- 
cise way: existing analyses are either too informal and high-level, or specific for 
one particular system. In this work, we propose a general formal method to anal- 
yse privacy in systems in which personal information is communicated and apply 
it to analyse existing IdM systems. We first elicit privacy requirements for IdM 
systems through a study of existing systems and taxonomies, and show how these 
requirements can be verified by expressing knowledge of personal information in a 
three-layer model. Then, we apply the formal method to study four IdM systems, 
representative of different research streams, analyse the results in a broad context, 
and suggest improvements. Finally, we discuss the completeness and (re)usability 
of the proposed method. 

Keywords: Privacy, Identity Management, Formal methods, Data minimization, 
Detectability, Associability 

1 Introduction 

As communication between service providers and their customers is increasingly tak- 
ing place on-line, reliable on-line identification and authentication is becoming increas- 
ingly crucial. The traditional username/password paradigm is not satisfactory: in an 
age of identity theft, identity fraud, phishing, and hacking, it provides too little assur- 
ance to organisations l65ll . while also providing little user-friendliness to customers. 
As novel solutions for identification and authentication emerge, this gives rise to novel 
privacy risks for users. 

Identity management [67 , 46 , 35| is an emerging technology to outsource user iden- 
tification, and possibly authentication, to an "identity provider". Identity providers en- 
dorse information about their users, and provide means for authenticating a user in a 
service provision. To organisations, identity providers offer reduced cost for obtaining 



1 



reliable user information; to users, they offer increased convenience by letting them 
re-use authentication credentials. 

While more business means more need for identification, it also means more pri- 
vacy issues caused by the increasing amount of personal information being collected. 
Privacy outcries have been reported concerning both personal information being used 
by companies for secondary purposes 04117311 . and it being stolen and then abused by 
third parties f2||40). Both issues underline the importance of the data minimization 
principle: companies should only collect the minimal amount of personal informa- 
tion needed to achieve a certain purpose (and keep it only while it is needed) ll57l . 
In several jurisdictions, privacy and data protection regulations (e.g., EU Directive 
95/46/EC, HIPAA) impose stringent privacy requirements on the handling of personal 
information. Media attention to privacy and organisations' need to comply with reg- 
ulations have spurred research interest in developing privacy-enhancing IdM systems 
Il8l l26ll72ll . i.e., IdM systems aiming to reduce the amount of personal information that 
is exposed to communication partners. 

Although different privacy -enhancing IdM systems have been proposed [ 8 26 72] , 
so far there is no satisfactory way to accurately and precisely assess and compare the 
privacy they offer. Several works 0] QT| [41] [47] sketch privacy problems in iden- 
tity management at a high level, but do no comprehensively analyse the differences 
between existing solutions. Other studies [ 1 43 1 consider privacy aspects of IdM sys- 
tems as part of a general comparison, but the criteria used to compare privacy are nei- 
ther formal nor detailed, and thus their privacy assessments do not offer much insight 
into differences between systems and the reasons behind them. Existing proposals for 
privacy-enhancing IdM systems ||8] [26] l72ll assess the privacy of their own solution, 
but the terminology and criteria used are specific to the setting at hand, making it hard 
to compare different systems. In summary, currently there is no method to compare 
different systems in a way that is precise and verifiable, and that offers enough detail 
to provide real insight into the privacy differences that exist between systems. 

Formal methods provide the machinery to perform such a comparison. Over the 
years, formal methods have arisen as an important tool to analyse security of commu- 
nication in IT systems 0H7JI52, 60 1. The idea is to express communication protocols 
in a suitable formal model, and then verify whether such a model satisfies, e.g. authen- 
tication properties ifTTl or secrecy properties [9j. Secrecy, in particular, expresses one 
aspect of privacy; namely, whether a certain piece of information is known by some 
party in a protocol. However, it leaves unanswered a question which is equally im- 
portant in the setting of IdM systems; namely, whether a certain piece of information 
can be linked to its corresponding data subject (who, in general, might not be a direct 
participant in the communication under analysis). 

Recently, several research efforts have focused on using formal methods to anal- 
yse privacy properties ifTBI [3T1 l32l [33l l69l : in particular, privacy properties have been 
studied in application domains such as electronic toll collection ll3D . e- voting ll32l[33l . 
and RFID systems IfTBI . However, in some cases the properties defined and verified are 
specific to their respective settings Oil [32 33 1. In other cases [ 16 1, the focus is on link- 
ing messages rather than interpreting them as personal information about a data subject 
as needed for the analysis of IdM systems. Moreover, these methods are focused on 
attackers rather than authorised insiders, so they do not analyse the knowledge of differ- 
ent (coalitions of) actors operating within one system. In the context of IdM systems, 
formal methods have been used to consider privacy aspects of non-repudiation [69 1. 

In our previous works l70ll7Tl . we have introduced a formal method to analyse the 
privacy guarantees offered by protocols in which personal information is exchanged. 
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We introduced a three-layer model to reason about the knowledge of personal infor- 
mation held by different (coalitions of) communicating parties l70l ITTI . The model 
captures that personal information in different contexts may satisfy different privacy 
properties; and that different pieces of information may have the same contents. We 
also showed how this model is determined from observations of communication be- 
tween the different parties. However, while the previous works presented the basic 
building blocks for privacy analysis, their model only captures communication that 
uses a limited set of cryptographic primitives; it does not offer an implementation of 
the analysis method; and finally, it does not consider which privacy requirements are 
relevant for IdM systems. 

In this work, we extend our previous analysis method to perform a formal privacy 
analysis of IdM systems. Specifically, our contributions are as follows: 

• We present a comprehensive set of detailed privacy requirements for IdM sys- 
tems, elicited through the analysis of existing systems 1 8 , 26 72] and taxonomies 

E3ED. 

• We extend our previous formal method [70] [7T) for the analysis of knowledge 
of personal information to cover additional primitives and cryptographic proto- 
cols (specifically, zero-knowledge proofs and issuing protocols for anonymous 
credentials). 

• We implement the analysis method in Prolog. 

• We validate the proposed method by analysing four IdM systems, representative 
of different research streams in identity management. We shows how the anal- 
ysis makes it possible to compare the privacy such systems offer and to draw 
recommendations for their improvement. 

• We discuss the main challenges concerning the (re)usabi-lity of the proposed 
method. Although the discussion is centred on our method, the identified chal- 
lenges can be generalised to any formal method for the analysis of IdM systems. 

This paper is structured as follows. First, we present an overview of identity man- 
agement together with the privacy requirements to be satisfied by IdM systems, and 
introduce four representative systems using a case study ({Q. Next, we present our 
formal analysis method and its Prolog implementation (^3|. We then apply the formal 
method to the systems, and discuss the results (Q. We discuss the completeness and 
(re)usability of the proposed method (§5j- We conclude the paper by discussing related 



work (5 6 
work (§ 7 



, drawing some conclusions, and pointing to interesting directions for future 



2 Identity Management 

In this section, we present the IdM systems we will analyse. First, we provide an 
overview of identity management. Then, we discuss the requirements for IdM systems 
relevant to privacy by data minimization. Finally, we present four representative IdM 
systems using a case study. 
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IdM systems 



credential-focused relationship-focused 

Linking Service Model 

cryptography tamper-resistant devices 
Identity Mixer Smart Certificates Smartcard Scheme 

Figure 1 : Taxonomy of IdM systems 



2.1 Overview 

As service providers are offering more and more customised to their users, they need 
to collect more and more of their personal information. Traditionally, each service 
provider would manage the accounts of users separately. However, this identity man- 
agement model, called the isolated user identity management model P31l . has disadvan- 
tages for both users and service providers: the user has to manually provide and update 
her information and keep authentication tokens for each service provider, whereas it is 
hard for the service provider to obtain guarantees that the information given by the user 
is correct. 

This problem is commonly solved by delegating the task of managing and endors- 
ing identity information to identity providers. Identity management is split up in two 
phases: registration and service provision. At registration, users establish accounts at 
(possibly multiple) identity providers. (This includes identification: i.e., the user trans- 
fers her attributes to the identity provider, and the identity provider possibly checks 
them. However, both the transfer and checking of attributes performed by the identity 
provider are out of scope of this work.) Service provision is the phase when a user 
requests a service from a service provider: at this point, user attributes required for the 
service provision need to be collected and sent to the service provider. 

Identity Management (IdM) systems can be divided into two main categories [ 1 1 
depending on whether or not the identity providers are involved in the service provision 
phase: credential-focused and relationship-focused systems (also know as network- 
based and claim-based systems [4 |). Figure [T] shows a taxonomy of IdM systems. 

In credential-focused IdM systems, the user gets long-term credentials from the 
identity provider in the registration phase, that she herself can present to the ser- 
vice providers in the service provision phase. These credentials contain her identity 
attributes. We can distinguish between two mechanisms employed to prevent the 
user from tampering with them, namely cryptography and tamper- resistant devices. 
Credential-focused systems relying on cryptography include CardSpace ll55ll . U-Prove 
1 58 1 and Identity Mixer (8). The system presented in IT721 relies on the use of a smart- 
card as a tamper-resistant device. 

In relationship-focused IdM systems, in contrast, the identity provider presents the 
attributes to the service provider. During the registration phase, identity providers es- 
tablish shared identifiers to refer to each other's identity of the user. During the service 
provision phase, the user authenticates to an identity provider. The identity provider 
then sends attributes to the service provider (possibly indirectly via the user). If needed, 
the shared identifiers established during registration are used to collect (or aggregate 
1 26 1) attributes held by other identity providers without the user having to authenticate 
to them as well. The combination of reliance on authentication performed by another 
party and exchange of identity information is sometimes referred to as federated iden- 
tity management I45ll65l . Note that this term is also used to describe the general con- 
cept of sharing information between different domains [4] or the mere use of multiple 
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identity providers [1]. To avoid confusion, we will not use it further. Relationship- 
focused systems include Liberty Alliance [42 1, Shibboleth ll35ll . and the linking service 
model 11261 . 

Because in MM systems, large amounts of personal information are processed by 
many different parties, privacy has become a major concern [41 , 68 1. In such systems, 
privacy threats posed by authorised insiders are nowadays considered to be a criti- 
cal problem besides outsider attacks on cryptographic protocols ||39l . Insiders may 
compile comprehensive user profiles to sell or use for secondary purposes such as 
marketing. These profiles can include sensitive information that is explicitly trans- 
ferred by the user, but also information that is transferred implicitly l68l . For in- 
stance, the mere fact that a user performed a transaction at a certain service provider 
may be privacy-sensitive. In addition, profiles held by different parties may be com- 
bined [68 1 to compile even more comprehensive profiles. Privacy-enhancing IdM sys- 
tems (e.g., [8 , 26 72]) aim to minimise the amount of information disclosed as well as 
prevent that different pieces of information can be linked together ATI . 

2.2 Requirements 

In this section, we present and discuss the requirements for MM systems that address 
privacy issues by data minimization. That is, we focus on the knowledge of personal 
information by insiders during normal operation of the system. The list of require- 
ments is given in Table [T] We distinguish between requirements concerning which 
personal information should be learned (non-privacy requirements), and which per- 
sonal information should not be learned (privacy requirements). We have elicited these 
requirements by analysing a number of IdM systems [8 26, 72] as well as analysing 
taxonomies of privacy requirements lfT0ll4TI . 

The basic requirement for IdM systems is that the service provider learns the at- 
tributes it needs ifTTIl : attribute exchange (AX). Note that in one service provision, a 
service provider may need attributes from several identity providers. 

Privacy by data minimization attempts to minimise the amount of information 
learned, and the extent to which it can be linked together [41 1. The first aspect, in- 
formation learned, can be further divided into explicitly and implicitly transferred in- 
formation ll68l . Detectability requirements capture explicitly transferred information: 
information about the user's attributes. Involvement requirements capture information 
about whether actors know about each other's involvement with the user: a kind of im- 
plicitly transferred personal information. The second aspect is captured in linkability 
requirements: namely, that (combinations of) parties should be able to link personal 
information from different sessions, databases, etc. as little as possible. 

We define three detectability requirements. The first are about the service provider 
learning no more than strictly necessary: no attribute that he does not need to know {ir- 
relevant attribute undetectability, SID), and no complete attribute value if all he needs 
to know is whether or not an attribute satisfies a certain property O (property-attribute 
undetectability, SPD). These properties limit the user profile a service provider can 
construct. In addition, IdM systems should guarantee that identity providers do not 
learn any value or property of attributes stored at other identity identity providers: we 
call this requirement IdP attribute undetectability (ID). 

Involvement requirements address the fact that the mere interaction of a user with 
certain identity or service providers implies a business relation which can be privacy- 
sensitive. For instance, ownership of credentials can be sensitive [64- 1 in domains such 
as healthcare, insurance, or finance. In addition, even if individual credentials are not 
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Non-privacy requirements 



Description 



Attribute exchange (AX) 



Anonymity revocation (AR) 



Privacy requirements 

Irrelevant attribute undetectability 
(SID) 

Property-attribute undetectability 
(SPD) 

IdP attribute undetectability (ID) 



The service provider knows the value of the required 
attributes/properties of the user requesting the ser- 
vice. 

The service provider and identity providers (possibly 
with help from trusted third party) can link service 
access to user profile. 

Description 

The service provider does not know anything about 
attribute values irrelevant to the transaction. 
The service provider does not know anything about 
attributes apart from the properties he needs to know. 
Identity providers do not know anything about the 
user's attributes from other identity providers. 



Mutual IdP involvement 

undetectability (IM) 

IdP-SP involvement undetectability 

(ISM) 

Session unlinkability (SL) 

IdP service access unlinkability (IL) 

IdP profile unlinkability (IIL) 

IdP-SP unlinkability (ISL) 



One identity provider does not know whether a given 
user also has an account at another identity provider. 
Identity providers do not learn which service 
providers a user uses. 

A service provider cannot link different sessions of 
the same user. 

Identity providers cannot link service access to the 
user profile they manage. 

Collaborating identity providers cannot link user 
profiles. 

Identity providers and service provider cannot link 
service accesses to user profile at identity provider. 



Table 1 : Requirements for MM systems 



sensitive, the precise combination of credentials held by a user may help identify her. 
It is natural in identity management that the service provider learns which identity 
providers certify the user's attributes: this allows him to judge their correctness. How- 
ever, one can aim to achieve that identity providers do not know the identity of other 
identity providers the user has an account at J26): we define this as mutual IdP involve- 
ment undetectability (IM). In the same way, a user might want to keep hidden from her 
identity providers the fact that she interacts with a certain service provider: we call this 
requirement IdP-SP involvement undetectability (ISM). 

Linkability is another fundamental privacy concern because it determines what user 
profiles can be constructed from the data that is collected llrjTl . To prevent a service 
provider from accumulating (behavioural) information, an IdM system should ensure 
it cannot link different service provisions to the same user: session unlinkability (SL). 
Indeed, in many cases the service provider does not need to know the identity of the 
user: for instance, if a user wishes to read an on-line article, the only information that 
is required is that she has a valid subscription. 

Another concern is that parties can build more comprehensive user profiles by shar- 
ing their personal information. To prevent this, they should not know which profiles are 
about the same user 11411 . A very strong privacy guarantee in this vein is that identity 
providers and service providers cannot link service provisions to the user: IdP-SP un- 
linkability (ISL). IdP profile unlinkability (IIL) is a weaker privacy guarantee requiring 
that two collaborating identity providers (without help from the service provider) can- 
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not link their profiles. IdP service access unlinkability (IL) is about the link between a 
service provision and the user profile at an identity provider, thus measuring whether 
identity providers are aware of individual service provisions. 

In practice, the ISL requirement is problematic for accountability reasons: if the 
user misbehaves, it should be possible to identify her [8 1. Several MM systems [8 , 72 1 
introduce a trusted third party that, in such cases, can help with the identification. 
The anonymity revocation (AR) requirement states that, possibly with the help of this 
trusted third party, the service provider and identity providers are able to revoke the 
anonymity of a transaction. (Note that in particular, AR also holds if the service 
provider and identity providers can revoke anonymity without needing the trusted third 
party.) 

2.3 Case Study 

This section introduces a case study which is used to present the MM systems studied 
in this paper. In particular, we consider a scenario with four main actors: 

• a user: Alice, a 65 year-old woman; 

• a service provider: an e-book store; 

• two identity providers: one for Alice's address (the address provider) and one 
for Alice's subscription at some society (the subscription provider). 

Registration Phase Alice creates an account at both identity providers. The address 
provider stores three identity attributes of the user: the street, city, and age. The sub- 
scription provider stores two user attributes: date of subscription and subscription type. 

Service provision phase On two separate occasions, Alice purchases books from the 
e-book store. To this end, she needs to provide her personal information, endorsed by 
the identity providers, to the e-book store. The service provider, for statistical pur- 
poses, demands to know the city that Alice comes from. Moreover, the e-store offers 
a discount to customers that are over 60 years old. As Alice is 65 years old, she is 
eligible for the discount. The e-book store, however, does not necessarily need to learn 
her exact birth date or age; Alice can just prove that she is over 60 years old. Moreover, 
the e-book store does not need to know that the purchases are both made by the same 
user. On the other hand, in case of abuse, the service provider does want to be able 
to link the purchase to Alice's profile at the address provider with the help of a trusted 
third party. (Note that the case study does not cover the separate issue of anonymous 
payment of the e-book.) 

2.4 Four Systems 

In this work, we analyse four IdM systems from the literature, representative of the 
types in Figure[T] We consider one traditional system, smart certificates [59], for whose 
development privacy was not a primary concern; it can be classified as credential- 
focused and relying on cryptography. We then consider three systems designed with 
privacy in mind: the linking service model G6l . a relationship-focused IdM system; 
Identity Mixer j8], a credential-focused system relying on cryptographic protocols; 
and a credential-focused IdM system based on smartcards ll72l we will refer to as the 
Smartcard scheme. We now briefly discuss these systems. 
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2.4.1 Smart Certificates 



Park et al. [59 1 proposed an IdM system built on top of a Public Key Infrastructure 
(PKI). In a PKI, a certificate authority (CA) issues certificates stating that a certain 
public key belongs to a certain user. A user authenticates by proving knowledge of the 
secret key corresponding to this public key. Identity providers issue certificates that 
link attributes to the public key certificate. In our analysis, we consider one particu- 
lar variant described in (59): the user-pull model with long-lived certificates obtained 
during registration. 

The flow of information is summarised in Figure [2] In the registration phase (Fig- 
ure 2(a) I, the user gets an attribute certificate from the identity provider (the "attribute 
server" in l59l ). which enables her to present her attributes to others. This involves 
three steps: (1) the user presents her public key certificate; (2) she proves that she also 
knows the corresponding secret key (this is an interactive protocol shown as a two- 
sided arrow in the figure); and (3) the attribute server issues an attribute certificate. The 
process is then repeated with the other identity provider (steps (4) to (6)). The attributes 
in the certificate are signed using the attribute server's secret key and hence cannot be 



tampered with by the user. During service provision (Figure 2(b) I, the user exchanges 
attributes with the service provider ("web server") in two steps: (1) she presents her 
public key certificate and the attribute certificates containing the attributes needed; and 
(2) she proves knowledge of the corresponding secret key. 

The system presented in ||59l is mainly designed to satisfy the attribute exchange 
requirement (AX) in a secure way ("the attributes of individual users are provided se- 
curely"). Privacy concerns are addressed in an extension of the system (not considered 
here) in which some attributes in a credential are encrypted in such a way that they can 
only be read by an "appropriate" server. 
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2.4.2 Linking Service Model 

The linking service model [26 1 is a relationship-focused IdM system. Its main goal 
is to facilitate the collection of user attributes from different identity providers in a 
privacy-friendly way without the user having to authenticate to each identity provider 
separately. To this end, this model includes a linking service which is responsible for 
holding the links between profiles of the user at the different identity providers without 
knowing any personal information about the user. 

The flow of information is summarised in Figure [3] During registration (Fig- 
ure 3(a) i, the user first creates an anonymous account at the linking service LS. LS 
requests the identity providers, IdPi and MP2, to authenticate the user; each identity 
provider generates a pseudonym for the user and sends it to LS (steps (1) and (2)). 
(The specific method of authentication between the user and the identity providers and 



linking service is out of our scope.) In the service provision phase (Figure 3(b) i, the 
user authenticates to IdPi. IdPi provides the service provider SP with an "authenti- 
cation assertion" containing the attributes requested from it, and a referral to LS (1). 
The referral is an encryption of the pseudonym shared between IdPi and LS that only 
LS can decrypt. SP sends this referral to LS (2), which responds by sending a similar 
referral to IdP2 (3). Finally, SP requests (4) and obtains (5) the required attributes from 
IdP 2 . 

The linking service model aims to satisfy the attribute exchange requirement (AX) 
as well as a number of privacy requirements |26|. In particular, the main goal of the 
linking service model is to guarantee that identity providers do not know the involve- 
ment of other identity providers (IM). Moreover, the model aims to achieve session 
unlinkability (SL) through the use of random user identifiers. Finally, the linking ser- 
vice should not learn the partial identities of the user for the service providers; that is, 
it does not learn any personal information about the user. We call this requirement LS 
attribute undetectability (LD); it is not listed in Table [TJbecause it is only relevant for 
this system; however, our analysis will include the verification of this requirement. 



2.4.3 Identity Mixer 

Identity Mixer (8) is a credential-focused IdM system using a cryptographic primi- 
tive called anonymous credentials. These credentials link attributes linked to a user 
identifier, but are issued by identity providers and shown to service providers using 
protocols ensuring that neither party learns that identifier. Thus, nobody but the user 
knows whether different issuing or showing protocols were performed by the same 
user, while integrity of the attributes is still assured. 

Figure [4] shows the information flows in Identity Mixer. During registration (Fig- 
ure |4(a)|, the user first sends a commitment to her (secret) identifier to the first identity 
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provider IdPi (1), after which the user and IdPi together run the credential issuing 
protocol (2). From this, the user obtains a credential with her attributes linked to her 
secret identifier, without IdPi learning the identifier. Communication with the second 
identity provider HP2 is analogous (steps (3) and (4)). In the service provision phase 



(Figure 4(b) 1, the user shows information from both credentials to the service provider 
SP. She first shows her credential from IdPi. To this end, she sends a message contain- 
ing the attributes she wants to reveal, and "commitments" to the secret identifier and all 
other attributes (1). Next, she performs a zero-knowledge proof (2) which proves to SP 
that the attributes and commitments come from a valid credential issued by IdPi, while 
revealing nothing else about the credential. The credential issued by MP2 is shown in 
the same way (steps (3) and (4)). 

Identity Mixer is designed to satisfy a number of privacy requirements 1H. In 
particular, it aims to satisfy both profile unlinkability and IdP/SP unlinkability (together 
called "multi-show unlinkability" in [8|) and irrelevant attribute and property-attribute 
undetectability (together called "selective show of data items" in (81). The system 
allows for providing the service provider with an encryption of some attributes for a 
trusted third party ("conditional showing of data items" in [8]) that can be used for 
anonymity revocation. Finally, the system allows credential issuing where an identity 
provider copies attributes from another certificate without knowing their values ("blind 
certification" in (8)). The main motivation for this functionality comes from the use of 
these certificates for e-cash [8|. In traditional identity management scenarios, such as 
our case study, identity providers should know the attributes they endorse, so we do not 
consider this requirement in this work. 

2.4.4 Smartcard Scheme 



Vossaert et al 17211 proposed a credential-focused IdM system which relies on PKI 
for authentication and on smartcards (or other tamper-resistant devices) to ensure that 
attributes are not modified and observed during their transmission from the identity 
provider to the service provider. Identity providers and service providers only commu- 
nicate via the smartcard, and each has a different pseudonym of the user based on a 
secret user identifier stored on the smartcard. 

The information flow defined in the scheme is shown in FigurefS] In the registration 



phase (Figure 5(a) 1, the smartcard SC and the first identity provider IdPi establish a 
secure, authenticated channel using key agreement (steps (1) and (2)). Over this secure 
channel, SC sends a pseudonym based on its secret identifier specific for IdPi (3); 
IdPi sends its attributes (4). Registration at the other identity provider MP2 is similar 



((steps 5) to (8)). In a service provision (Figure 5(b) 1, SC and service provider SP 
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establish a secure, authenticated channel as in the registration phase (steps (1) and 
(2)). SC generates a random session identifier (3); SP then specifies what attributes he 
wants, and how long they may have been cached (steps (4) and (5)). SC responds by 
giving the requested attributes. For anonymity revocation purposes, this response also 
includes Alice's identifier encrypted for the trusted third party (6). 

The system is designed to meet several requirements related to the knowledge of 
personal information |72|. The requirements specified correspond to our notions of 
attribute exchange, profile unlinkability, and anonymity revocation. Irrelevant property 
and property-attribute undetectability follow from their more general notion of "re- 
stricting released personal data". The Smartcard scheme also aims to fulfil IdP profile 
unlinkability and IdP/SP unlinkability by preventing collusion of identity and service 
providers. 

2.4.5 Privacy Requirements Claimed by Systems 

Table [2] summarises the privacy claims for the systems. In this work, we will formally 
verify whether these claims actually hold. In addition, we will analyse the systems 
against the complete range of identified requirements in order to achieve a compre- 
hensive comparison of their privacy features. The formal methods presented in the 
next section will allow us both to verify claimed requirements, and to find out which 
non-claimed requirements still hold. 

3 Formal Analysis of Personal Information Knowledge 

This section presents a formal framework for the analysis of IdM systems, extending 
the work in ||70l I7TI . In [70|, we described how to express knowledge of personal 
information in terms of items and links between them. In ||7TI . by extending this idea, 
we modelled the knowledge of personal information arising from observed messages 
using a limited set of cryptographic primitives. 

In this paper, we extend this model by considering properties of user attributes, 
additional primitives (digital signatures with appendix, labelled encryption, authenti- 
cated key exchange, and anonymous credentials), and cryptographic protocols (zero- 
knowledge proofs and issuing protocols for anonymous credentials). In addition, by 
introducing traces, we formalise knowledge evolution by the transmission of messages 
between different parties. Finally, we present a Prolog implementation of the above 
formal model. 

3.1 Three-Layer Model 

We now present a formal model of actor knowledge. 
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3.1.1 Personal Information 

A piece of personal information in the digital world is a specific string that has a specific 
meaning as personal information about a specific person. We distinguish between two 
types of digital personal information: identifiers and data items. Identifiers are unique 
within the system; for data items this is not necessarily the case. The sets of identifiers 
and data items are denoted X and T>, respectively. The set £ of entities models the 
real-world persons whom the considered information is about. 

The link between the information and its subject is captured by the related relation, 
denoted <-»•. This is an equivalence relation on entities, identifiers and data items, such 
that o\ ++ 02 means that o\ and 02 are information about the same person. In particular, 
any identifier or data item is related to exactly one entity. Elements of the set O := 
£uIuT> are called items of interest. 

We give additional structure to the above sets in two respects. First, private and 
public keys are particular cases of identifiers: they form sets JC~ c I and JC + c I, respec- 
tively. Given a private key k~ , the corresponding public key is k + and vice versa. Sec- 
ond, we express that data items satisfy certain "properties" from a fixed set , \j/„} 
relevant to the application domain. Suppose that 1//, represents the property stating that 
an age is over 60. Then, \j/[(age c ) e T> expresses that age c is an age that is over 60; 
Yi( a 8 e d) t *D and \j/j(city c ) i T> states that the property does not hold (or it has no 
meaning) for the given data item. 

These concepts, however, are insufficient to model all requirements for IdM sys- 
tems we are interested in. Several requirements in Table [T] are about whether or not 
an actor (or set of actors) can "link" two pieces of personal information to each other, 
i.e., whether or not the actor knows they are related. However, expressing this becomes 
problematic when an actor learns the same piece of information in two different con- 
texts without realizing that it is the same information. For instance, in our case study, 
the e-book store will learn two profiles containing data items city al , \lfi{age al ). To ex- 
press session unlinkability, the model should be able to express the difference between 
the "instances" learned in the first and second profile: the store can link the instance 
of city a [ in profile 1 to the instance of i//,(age a/ ) in profile 1, but not to the instance in 
profile 2. Thus, the above model needs to be extended to distinguish between different 
instances of the same piece of personal information. 

In addition, an actor may be able to deduce information from the fact that different 
pieces of information have the same string contents. For instance, suppose that in a 
service provision, the user's identifier is transmitted using deterministic encryption. 
Then, although an observer may not be able to determine the identifier, he still knows 
that different service provisions involved the same user. 

3.1.2 Three-Layer Model 

Because of the need to distinguish different instances of the same piece of information, 
but also to reason about message contents, we introduce a three-layer representation 
of personal information. The representation consists of the object layer, information 
layer, and contents layer. At the information layer, as described above, the information 
itself is represented, e.g., "Alice's city". At the object layer, information is described in 
terms of the context in which it has been observed, e.g., "the city of the user in service 
provision #1". At the contents layer, information is described in terms of the strings 
actually transmitted in a protocol, e.g., "Eindhoven". 

At the object layer, we model the context in which an actor knows pieces of in- 
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Figure 6: Example of the three-layer model: three different context items with the 
information and contents they represent (left); the three-layer model of information 
(right) 

formation. A context is a tuple (r),k), where r) is a domain and A: is a profile within 
that domain. A domain is any separate digital "place" where personal information is 
stored or transmitted; in our case study, the databases held by the identity and service 
providers are domains; also every instance of a communication protocol can be seen 
as a domain. Profiles contain personal information about one user in a domain. For 
instance, databases contain several user profiles; moreover, every logical role in a pro- 
tocol instance is a different profile (though different roles could be performed by the 
same entity, e.g. an identity provider that also acts as service provider). 

In such a context, pieces of information are represented by variables. This repre- 
sentation makes it possible to reason about such personal information without regard- 
ing the instantiation. Identifier variables represent identifiers (set I), whereas data item 
variables represent data items (set D): e.g., a variable city e D may denote the city in 
a profile. A context data item is a data item variable d in a context (r],k), denoted 
d\^ 6 D e ; the set l c of context identifiers is defined similarly. Entities are not repre- 
sented by variables; instead, an entity e e £ in a context (r],k) is denoted e\^; the set 
of context entities is £ c . The reason is that, because entities are not digital information, 
there cannot be multiple "instances" of an entity. Every context contains exactly one 
entity who is the data subject, i.e., all information in the context is about that entity. 
O c := £ c u l c u D c is the set of context items of interest. The structure of private/public 
keys and properties transfers to the object layer: context private and public keys are 
modelled by sets K~, c l c ; a property y/,- holding for a context data item d is denoted 



Items at the contents layer can be seen as strings of arbitrary length in some al- 
phabet, i.e., the set L*. The exact form of the contents layer is not relevant for our 
purposes. Rather, it is relevant to determine whether two pieces of information have 
the same contents: this is expressed using the % function, as described below. 

3.1.3 Maps Between Layers and Equivalence 

The link between the object layer and the information layer is given by the substitution 
function a : O c -*■ O. This function satisfies the following properties: 

1. ct(D') cD, <7(I c ) cl; for any entity e, context (r),k): a(e\^) = e; 

2. u(x\^) ++ u(y\^) for any context items x\^, ; 
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3. cr(K c + ) c/C + , cj(K c T) c/C", and o(key + \ n k ) = k + if and only if o(key-\%) =k~; 

4. y/,(d)eD c iff y/,(cr(d)) e£>. 

Intuitively, <T maps: 1. context items to information items of the corresponding type; 
2. context items from the same context to related items of interest; 3. private/public key 
context items to private/public keys; 4. property context items to properties. 

The link between information and its contents is given by the function T. The 
domain of the function is luV (entities have no contents). The function T satisfies 
two properties. First, it is injective on I: this formally expresses the uniqueness of 
identifiers within the system. Second, if y/,(d), l//,'(d') e D c , then T(y/,(d)) = T(i// ; (d')); 
that is, the contents of an attribute property are independent from the attribute value. 

We introduce notation for context items x|^, y\f representing the same informa- 
tion or contents. If (j(x\^) = a(y\f), then we write x\ k = y\f and we call x\ k and y\f 
equivalent. If t(g(x\^)) = r(a(y\f)), then we write x\^ =y\f and we call them content 
equivalent. Clearly, equivalence implies content equivalence. Because T is injective on 
identifiers, two identifiers are equivalent iff they are content equivalent. 

Example 1. Consider three context data items city]^, city\f, and city\\ (Figure|6] left 
side) where city e D. Let <r(c/fy|y ) = <j(city\*) = city c , cr(c/fy| j ) = city d , and T(city c ) = 
x(city d ) = "Eindhoven". Then, cityff and city\* are equivalent; moreover, all three 
context messages are content equivalent. □ 

3.2 Actor Knowledge: Detectability and Associability 

We now show how knowledge of personal information by an actor, or by a coalition 
of actors, is derived from messages he has observed. An actor is an entity with a view 
on the system; the set of actors in the system is denoted A c £. The requirements 
in Table [T] can be analysed by determining (1) what context items the actor(s) can 
detect; and (2) which of these items the actor(s) can associate. A basic version of the 
following analysis method is described in [70 1; we improve it by modelling additional 
cryptographic primitives: labelled encryption, zero-knowledge proofs, authenticated 
key agreement, and anonymous credentials and their issuing protocols. We also model 
reasoning about attribute properties. Finally, we model digital signatures with appendix 
(53], whereas J70| models clear-signing. These primitives are necessary to analyse the 
selected MM systems. 

3.2.1 Communication Messages 

Communication in identity management protocols involves messages built up from 
personal and other information using cryptographic primitives such as encryption, sig- 
natures, and hashes. We model these messages by means of the set C c of context 
messages. The basic building blocks of messages are context data items D 6 and identi- 
fiers l c , and non-personal information, such as shared keys and nonces, from G c . Items 
in G' belong to a domain, but not to a user profile: in this case we denote the profile as 
•, e.g. shakey] 7 } . The three sets together form the set P c ' := D c ' u I 6 ' u of context items. 

The set C° is built up from P e with the grammar shown in Table [3] The concate- 
nation, hash, and (a)symmetric encryption primitives are standard |29, 37 1. Digital 
signatures are "with appendix" [53 1: that is, an actor needs to know the message that 
was signed in order to verify the signature. Labelled asymmetric encryption [8| is 
asymmetric encryption to which a label is unmodifiably attached at encryption time. 
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Messages 



Meaning 



M,Mi ::=0 | p | 
{M U M 2 } | 
S k -(M) | 
H{M) | 
E' Ml (M 2 ) | 
E k+ (M) | 
Ew(Mi)m 2 I 

AKA(k7;M!;k2;M 2 ) 

credf_> (M 2 ;M 3 ) | 

ZK(M 1 ;M 2 ;M 3 ;M 4 ) | 

ICredf 1 (M 2 ;M 3 ) 



empty message, atomic message 
(associative) concatenation of messages M\ and M 2 
digital signature of message M with private key k~ 
hash of message M 

symmetric encryption of message M 2 with key M\ 
asymmetric encryption of message M with public key k + 
labelled asymmetric encryption of message M\ with public key 
k + and label M 2 

derived key from authenticated key agreement (AKA) with 
(SK,randomness) pairs (k\ ,M\) and (k 2 ,M 2 ) 
anonymous credential with user identifier M\, issuer private 
key k~, attributes M 2 , and randomness M3 
zero-knowledge proof of knowledge of secret M\ with proper- 
ties M3 using public information M 2 and randomness M4 
issuing protocol for anonymous credential credj^ 1 (M2;M^), 
where M3 is derived from M3 



Table 3: The grammar C c of context messages: p e P c ; k + e K+; k , k ( - e K c ; is an 
empty message. 



For instance, the label can represent a policy specifying when the recipient is allowed 
to decrypt the data. 

Authenticated key agreement (AKA) [49 1 allows two parties to derive a unique 
session key based on secret keys and randomness contributed by both parties. We 
consider the variant presented in 11491 in which both parties send each other a random 
value. Both parties can determine the session key, modelled by the AKA primitive, 
from one private key, the other public key, and the randomness. 

Finally, the cred primitive models anonymous credentials [8 |. An anonymous cre- 
dential credj^ 1 (M 2 ;M3) represents an endorsement with private key k" that the at- 
tributes M 2 belong to the user with identifier M\, randomised using M3. Using the 
protocols we describe next, anonymous credentials can be issued without the issuer 
obtaining the credential or learning M\ ; also, their ownership can be proven efficiently 
without revealing the credential itself. 

Several two-party cryptographic protocols occur in the systems presented in Sec- 
tion 2.4 These protocols only have meaning when looked at as a whole, i.e., the 
meaning lies not in individual messages, but in their combination in a particular order. 
Thus, we model the complete transcript (i.e., all messages of all participants) of such a 
protocol as one primitive. We introduce two such primitives. 

First, we model a family of zero-knowledge (ZK) proofs (e.g., [30 1) by means of 
the ZK primitive. In a ZK proof for a given property, a prover wants to convince a 
verifier that he knows some secrets satisfying that property with respect to some given 
public information, without revealing anything about the secrets. Here, we consider 
ZK proofs proving that (1) the public information has a certain message structure with 
respect to the private information, and (2) some secret attributes d, have some proper- 
ties y/)t(d/) to be verified. For instance, ZK({d, n};"H({d, n}); y/ 2 (d); n') denotes a ZK 
proof (using randomness n') convincing a verifier knowing the hash W({d, n}) that the 
prover knows the pre-image {d, n} of the hash, and that 1/^2 (d) is satisfied; without the 
verifier learning anything else about d or n. See Appendix |A. 1 | for a detailed discussion. 

Second, we model the issuing protocol for anonymous credentials [8| by means of 
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Testing C a \- tr\ C a t- n ( m ?=>. n; 

Content Cal ""' Cfl K m ' C " Km 2 ((mi = m 2 ) => (m 3 = m 4 ); (l _ C) 
analysis (hC) ^ h n2 m = m ,~ m4 n 2 ) 



Figure 7: Deductive system: general rules (C a a set of context messages, m, m,, n, n', 
n, context messages; d a context data item; n i = m3 ~, m4 n 2 means ni and n 2 are equal up 
to replacing ITI3 by m 4 and vice versa) 



the ICred primitive. This protocol is run between a user and an issuer. In advance, 
both parties need to know the attributes to be certified, but only the user needs to know 
the identifier to which the attributes are issued. As a result of the protocol, the user 
obtains an anonymous credential linking the attributes to the identifier. The issuer does 
not learn the credential; moreover, because he does not know the identifier, he cannot 
issue credentials in her name without her involvement. Also, by using ZK proofs for 
proving ownership, the identifier is kept secret with respect to the service provider, thus 
preventing linkage. See Appendix A. 2 for details. 

Analogously to messages at the object layer, we also consider messages at the 
information and contents layers, and extend a and x to messages. Thus, a maps 
grammar elements of C c to elements of £' defined by the same grammar, but in- 
stead generated by V = T> u I u Q, where Q is the information represented by items 
in G e . E.g., if a(city\^) = city c and a{city\\) = city d , then o{{H(city\ 1 l ),city\\}) = 
{W(city c ),city d }. A similar set C cnt and map x : O -* C cnt are defined for contents. 
When contexts, domains, and profiles are applied to messages, they apply to all con- 
text items in the message, e.g., E shakey \ (cify| 1 )| ,J := E shakey p (city^) and {id,city}\^ := 
{z'^lj ,C2Yy|j }. Like context items, context messages m and n are equivalent iff cr(m) = 
cr(n), and content equivalent iff T(c(m)) = T(c(n)). The three-layer model of mes- 
sages is shown in Figure [6] (right side). 

Messages in our model satisfy two important properties: their contents are deter- 
ministic and unique. By deterministic, we mean that given the same contents as input, 
cryptographic primitives always give the same output (i.e., x is well-defined). Random- 
ness, e.g. in signing or in non-deterministic encryption, can be modelled explicitly as 
part of the plaintext. This makes it possible to distinguish the case where an actor ob- 
serves two different randomised encryptions with the same input from the case where 
he observes the same randomised encryption twice; in the latter case, we will allow an 
actor to draw certain conclusions from this. Uniqueness is expressed by the structural 
equivalence assumption ||7TI . Note that elements of C cnt could a priori be the same as 
strings, e.g. a collision in the hash function could cause W(x(x)) and H(x(y)) to be 
the same string even if x(x) + x(y) ; or EL, ( x{ y) ) could happen to be the same string 

as H(x(z))- We assume that this does not happen, i.e., the grammar C cnt uniquely 
represents message contents. (Of course, different elements of £' can still map to the 
same element in C an if the information items have the same contents.) 
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Figure 8: Deductive system: inference rules for basic primitives (C a a set of context 
messages; m, n, n* context messages; k + /k" and k*/k~ public/private key pairs) 



3.2.2 A Deductive System For Detectability 

We now formalise what context items an actor (or a coalition of actors) can detect. For 
actor a e A knowing context messages C a c C c , C a I- m means that a can derive context 
message m. The semantics of I- is defined by means of the deductive system given in 
Figures [7] [8] [9] and[lO] Context items p € P e such that C u H p are called detectable by a. 
For a coalition A c A = {a i , . . . , a n } of actors, H is applied to the union Ca = C ai u ... u C a „ 
of their respective message sets. 

Deductive systems are commonly used in protocol analysis to reason about what 
messages an attacker can fabricate (e.g., Il29l l37l ). For this, the context in which a 
message is known is not relevant, and thus not considered: i.e., existing deductive 
systems operate at our information layer. Conversely, for our purposes the context is 
relevant; hence we extend standard deductive systems to the object layer. 

Figure [7] presents the rules of the deductive system that are not about particular 
primitives. (i-O) is the standard axiom to derive known messages. The (i-Ey/) rule al- 
lows the derivation of properties of attributes (if the attribute indeed satisfies the prop- 
erty). The testing rule (hT) and content analysis rule (i-C) are needed to mimic actors' 
reasoning about different representations of information. For instance, suppose an ac- 
tor knows the key for decrypting a certain message, but not in the message's context. 
By trying out the key, he can generally find out it is a valid decryption key, learning 
the key in the context of the message. The testing rule captures this situation. The 
content analysis rule captures several other kinds of conclusions drawn from observing 
the same message contents in different contexts. They are discussed in detail at the end 
of this subsection. 

Figures [8] [9] and 10 model the primitives from Table [3]by construction and elim- 



ination rules as in traditional deductive systems |29 37 1. Construction rules express 
construction of a message from its components. For instance, an encryption can be con- 
structed from the plaintext and key. Elimination rules express recovery of the compo- 
nents of a message. For instance, an actor can recover the plaintext from an encryption 
if he knows the key. 
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Figure 9: Deductive system: inference rules for ZK proofs (C a a set of context mes- 
sages; m*, n* context messages; p, properties of m^, i.e., every p, = ^(m^) e D c for 
some j, k) ) 



C a i-{k ,m ! ,m 2 ,n} C a V- {k ,m 1 ,m 2 ,n} 

(i-CR) (i-CI) 

C fl i- cred™ 1 ( rri2 ; n ) C a v- ICred™ 1 ( m 2 ; n } ) 

C a \- {ICred™ 1 (m 2 ;{n,}/ =1 ),-H(m 1 ,n 1 ),k + ,n 3 } 

O-EI, ) 

C a y- {mi,ni,n 2 } 

C fl K{ICred^(m 2 ;{n,}/ =1 ).k + ,m 2 ,n 6 } C a t- {ICred™ 1 (m 2 ;{n,}/ =1 ),n 2 } 

: (hEI 2 ) (hEI 3 ) 

C a hk C«i-cred|^ 1 (m 2 ;{n 2 ,n5}) 

Figure 10: Deductive system: inference rules for anonymous credentials (C a a set of 
context messages; m,, n, n, context messages; k~ context private key) 



Figure [8] covers several basic primitives. Hashes, symmetric encryption, concate- 
nation, signature, and (labelled) asymmetric encryption are modelled as usual l29l . For 
labelled encryption, note that the label can be derived from the encryption (l-EL'), but 
to change it, the plaintext is needed, i.e., the label is unmodifiably attached. To derive 
a session key using authenticated key agreement, an actor needs to know one of the 
private keys used, the other public key, and both parties' randomness (i-CG), (i-CG')- 

Cryptographic protocols are also modelled by construction and elimination rules. 
The primitive represents the complete protocol transcript. The construction rule repre- 
sents one actor simulating the whole protocol run; the usual case when two actors run 
the protocol together and both contribute inputs is captured by traces (Section 3.3 1. Al- 
though both actors involved learn the same protocol transcript, what information they 
can derive from it will in general depend on the other knowledge they have. 

Figure [9] models privacy aspects of a large family of ZK proofs known as "£- 
protocols" l30l . There are E-protocols for many properties; in particular, they are used 
to prove properties of anonymous credentials j8 |. The randomness for E-protocols is 
of the form {n„, n v }, representing contributions by the prover and verifier, respectively. 
Apart from the usual construction rule, there are two elimination rules: (hEZi) states 
that the property proven by a ZK proof can be seen from its transcript; (1-EZ2) states 
that the prover's secret may be derived from the public information and the prover's 
randomness. We assume that parties do not re-use their randomness; also, because we 
are only interested in privacy aspects, we have not included rules to derive randomness 
used in the ZK proof. See Appendix A.l for details. 

Figure 10 models anonymous credentials and their issuing protocol based on SRSA- 



18 



Testability (* 



*) 



£p(m) ?=> n 
Z?l<+(m) ?=> k~ 
^(m),! ?=> k~ 
J k _(m) ?^{k + ,m} 
cred™'(m 2 ;n) ?=> {k + , m, , m 2 } 
ZK(mi ;m2;m3;{n / ,,n v }) ?=> rri2 
ZK(mi;m 2 ;m3;{np,n v }) ?=> n p 

ICred^(m 2 ;{n,}/ =1 )?^ - 
... {mi,k + ,n 2 } 
... {tt(mi,m),k + } 
... n 3 

... cred™ 1 (m 2 ;{n2,n 5 }) 
... {k + ,m 2 } 
... n 6 



Testing by.. 



A.2i 



Trying decryption with key 
Trying decryption with key 
Trying decryption with key 
Signature verification 
Signature verification (§ 
Verification of ZK proof 
Re-calculating commitment 

Re-calculating commitment 
Verification ZK proof 1 
Commitment ZK proof 1 
Recognise signature 
Verification ZK proof 2 
Commitment ZK proof 2 



Table 4: Testability relation 



*: m, m,, n, n, context messages; k + , k context 



private/public keys. The second column describes which test that actor(s) can perform. 



CL signatures (8). An anonymous credential is usually derived by the user from the 
transcript of its issuing protocol (1-EI3) (the issuer does not know n 2 and so does not 
learn the credential); but it can also be constructed directly from its components (i-CR). 
Similarly for the issuing protocol transcript itself (1— CI). Before the issuing protocol 
takes place, the user needs to have sent a randomised commitment "H(mi,ni) to her 
secret identifier to the issuer. During the protocol, additional randomness n 2 ,...,ng is 
generated by the two parties; ni , r\g together form the randomness component of the 
ICred primitive. Inference rules (1— EI 1 ) and (i-EI 2 ) model the inference of secret infor- 
mation from the transcript using randomness and other inputs to the protocol. As with 
our model of ZK proofs, we only consider rules needed to infer personal information, 
and assume non-re-use of randomness. In Appendix | A.2| we explain why these rules 
accurately capture privacy aspects. 

The testing rule (i-T) (Figure[7| expresses that for certain primitives, actors can test 
whether specific inputs were used. For instance, an actor can try to verify a signature 
S\<- (m) using any known public key k' + and message m'; if verification succeeds, he 
learns that k + has the same contents as k' + , and m has the same contents as m'. We 
call {k + ,m} testable from S^- (m), denoted S^- (m) ?=> {k + ,m}. If n is testable from 
m and an actor can deduce a message n' with the same contents as n, then by (hT) he 
can also deduce n. 

The testability relation ?=> is defined in Table |4] Symmetric and (labelled) asym- 
metric encryptions allow for testing of the decryption keyQ Signatures allow for sig- 
nature verification. Similarly, an anonymous credential can be verified to correspond 
to a given verification key, message and secret identifier. From a ZK proof, the public 



information and user's randomness can be tested (Appendix A.l 1. A credential issu 



ing protocol transcript allows for testing of various nonces and information used (Ap- 



pendix A. 2 1; in particular, all messages needed for the elimination rules (1— Eli)— (1— EI3) 



are testable. 

Example 2. Let C a = {E' k {da) \ n , I |f } be the set of messages known by an actor a, with 
k\ K = If. Then C a 1- da\ K can be deduced as follows: 



'This is an over-estimation in case the plaintext is unknown, random, and unformatted, e.g., a nonce. 
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(i-O) (k0) 

0-CC) 

(i-O) (hCH) (i-O) 

C fl H id\\ C a h ,ag e |^ ) C a h H{id,age)$ 
(hC) 

Figure 11: Deduction of given message set C a = {W(id ,age)^ ,id*Q ,age\^ } (Ex- 
ample |5j. 



0-0) (i-O) 

(i-O) (kT) 

C a )-E' k (da)\ K C a ^k\ n 

(i-DE) 

C a I- da\. 

The deduction models an actor(s) testing whether l\? is the decryption key for E' k (da)\ n 
(i-T). By learning it, the actor(s) can decrypt the message (i-EE). □ 

The final rule of our deductive system is the content analysis rule (i-C) (Figure [8}. 
It models conclusions an actor can draw from seeing messages with the same contents 
in different contexts. The statement of the rule relies on the syntactic structure of 
messages, which we first elaborate on. 

The syntactic structure of messages describes how they are constructed from data 
items, identifiers, and non-personal information using cryptographic primitives. The 
components of these primitives are numbered according to the order defined by the 



corresponding construction rules in Figures [8] [9] and 10 : e.g., E' n (m) has first com- 
ponent m and second component n. (For the AKA primitive, we take the private keys 
and nonces as submessages.) Recursively, submessages n of m have a well-defined 
"position" z in m, and we write n = m@z. 

Example 3. The message W(E' n (m)) has four submessages: 

(i) H(E' n (m))@e =U(E' n (m)); (ii) H(E' n (m))@l = E' n (m); (iii) H(E'„(m))@U = m; 
and(iv)-H(£^(m))@12=n. □ 

If two context messages mi and rri2 are content equivalent, then the properties 
of a and t imply content equivalence of their submessages. That is, if mi@z and 
xr\2@z are defined (i.e., there exists a submessage at position z), then they are content 
equivalent. In addition, if mj @z = k + and rri2@z = k' + , then not only k + = k' + follows, 
but also k~ = k' , and vice versa. Similarly, if m i and rri2 contain data items satisfying 
a property the content equivalence of that property is also implied. Formally: 

Definition 4. Let mi, rri2, ni, ri2 be context messages. The pair (mi,m2) is evidence 
for ni = n2, denoted (mi = ITI2) => (ni = ri2 ) , if mi = rri2, and for some z, k and all 
i e { 1 ,2}, one of the following three conditions holds: 

1. n, = m,@z; 

2. m;@z and n,- form a private/public key pair; 

3. n,- is a property % of m,@z. 
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The "content analysis" inference rule (i-C) then states that if an actor can derive 
evidence (mi, ITI2) for r\\ = r\2 and he can derive a message with ni in it, then he can 
also derive the message with ni replaced by r\2, and vice versa. 

Example 5. Let C a = {H.(id ,age)^ jid'Q ,age\^ } be the set of messages known by actor 
a with id e I, age e D such that id)\l = id\^ and age^ = age\^. C a 1- ^{{id.age)'^ holds, 
and by (i-CC), (i-CH) we have C„ 1- T-L{id^ ,age\^ ). From this, a knows that id]^ = id\^ 

In the same 
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(as well as age^ = age\^ ). By (i-C) he can then deduce id\^ (Figure 
way also C a 1- age^ follows. 

In ||7T1 . we formally compared the expressiveness of our context-layer deductive 
system to that of the corresponding standard information-layer deductive system. We 
showed that for any piece of information, there is at least one context-layer represen- 
tation that our system can derive. Therefore, no information is lost. The deductive 
system presented here is an extension to the one presented in [7 1 1, and the same line of 
reasoning applies here as well. Intuitively, for every elimination rule, testing rules can 
be applied to derive context messages that satisfy the prerequisites for the rule. Thus, 
one elimination rule at the information layer corresponds to one elimination rule and 
some testing rules at the object layer. 



3.2.3 Associability 

After detectability, we now consider the other type of knowledge held by an actor: 
associability. For associability, in addition to messages, the set C a contains the context 
entities that a knows: C a c C c u£ c . Associations between context items follow from 
properties of both a and T. First, context items in one context are related, and so is the 
same entity in different contexts (properties 3, 4 of a). Second, context identifiers with 
equal contents are equal (property 1 of T). Thus, define the associability relation ++ a 
of actor(s) a as the minimal equivalence relation on O c such that: 

1. Forall^elf eC a n£ c :e\*~ a e\ 1 }; 

2. Forall^,^6 c :^^ a y|^; 

3. If C a 1- m 1 , C a i— m2, and (mi = rri2) =>■ (ii = h) for i 1 , 12 e l c , then ii <->- a \2- 

For associability by a coalition {ai,...,a„} c A, written as ++ ai +...+ an , the above rules 
are applied to Ca = C ai u . . . u C a „ ■ 

Note that actors may associate items which they cannot detect. In fact, because of 
transitivity of this may help to establish a relation between detectable items: 

Example 6. Let C a = {{^^.(^li). ^ilF 7 . {E s hakey\.( id \i)> d '\iW) be the set of 
messages known by an actor a, where E^a^ (id\ l )|' 1 = E s i m i cey i ( \%. Then, id\^ 
id\* by condition 2 for ++ a (even though the actor can detect neither context identifier). 
By condition 1 for ++ a and transitivity, d\^ <-^ a d'\f follows. □ 



3.3 Message transmission 

Actors increase their knowledge of personal information by exchanging messages. We 
model the increase of knowledge in a particular system evolution by a sequence of 
states. A state comprises the sets {C{} x€ ^ of messages and entities known by each actor 
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x e A at time t = 0, l,...,n. The three-layer model is defined independently from the 
state, and contains all items and messages within the system (including items generated 
during the protocol execution, such as nonces). For each actor (or coalition of actors) 
a, the state {C' x } xe A determines detectability C a \- ... as defined by the deductive system 



in Section 3.2.2 and associability ++ a as defined in Section 3.2.3 (we write *+ a if we 
want to make the state explicit). 

The transition from one state to another corresponds to a message transmission. 
A message transmission involves two parties; each party is modelled by an identifier 
representing its communication address in the transmission. The simplest type of mes- 
sage transmission is one actor sending a message to another; two other types model the 
execution of cryptographic protocols. 

Definition 7. A message transmission can be of three types: 

1. a->b:m; 

2. a i-^ b : ZK(mi; rri2; rri3; rri4); 

3. a^ b:ICred^(m 2 ;n), 

with a,b context identifiers; k~ e K 6 . a context private key, and m,m,,n context mes- 
sages. A trace 1 is a sequence of message transmissions f,-, denoted X = h;t2', ■■■',t n . 

In message transmission a >-* b : ZK(..), a is the prover and b the verifier; in a h> b : 
ICred(..), a is the user and b the issuer. 

To verify the validity of a message transmission, it is necessary to verify whether 
an actor a can send a message m in a certain state {C' a } a eA- He can certainly do so if he 
can derive the message from his knowledge base (C' cl I- m); however, it is also possible 
that m (or its submessages) has not been observed by any actor in that state, so C a h- m 
does not hold (hereafter we call such messages undetermined). In this case, a needs to 
"instantiate" m by deriving a message equivalent to m from the messages he knows. 
Note that at that point in time, a may have different choices about what information 
information to send. However, we are only interested in the choice determined by a 
corresponding to the system evolution we consider, and not in any alternative choices 
leading to alternative system evolutions. We first formalise the notion of "determined". 

Definition 8. Let {C x } xe A be a state at time t . 

• Context item n is determined in state {C x } xe A if there is an actor ae A, message 
m € C a , and z such that: (1) n = m @z; or (2) n and m @z form a private/public key 
pair; or (3) n is a property of m @z- 

• Context message n is determined in a state if all context items occurring in n are 
determined. 

• Context {r\ , k) is determined in a state if at least one context item x\2 in (77 , k) is 
determined (i.e., the data subject of the context is defined). 

The following definition formalises the intuition that to send an undetermined mes- 
sage m, a should derive an equivalent message n in which all undetermined items are 
"instantiated": 

Definition 9. Let {C' u } ck a be a state at time t. A message m is determinable by actor 
a in {C a } a€j \ if C a 1— n for some n = m satisfying the following conditions: 
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t = 


Determinable by a 


Determinable by b 


a -> b : m 


{a, b, m} 





a w b:ZK(mi;m2;m 3 ;{n fl ,n i }) 


{a,b,mi,n a } 




a^b:ICred^(m 2 ;{n,}/ =I ) 


{a,b,k + ,m 1 ,n 1 ,n 2 ,n 3 ,n 7 } 


{k + ,k~,m2,n 4 ,n 5 ,n 6 } 



Table 5: Determinability requirements for the different types of message transmissions 



1 . Whenever m @z is determined, then m @z = n @z. 

2. Whenever m@zi = m@Z2, then n@zi = n@Z2- 

3. If m@z is a context item whose context (rj,fc) is determined in {C' a } a€ ^ with 
k + -, then n@z *w where *|^ represents any message associable to (n,u). 

4. If m@zi, m@Z2 are context items whose context (rj,k) is not determined in 

The message n chosen as instantiation for m needs to be equivalent to m, i.e., have 
the same structure as m and contain the same information; it also needs to satisfy the 
additional constraints from conditions 1^4. Condition 1 states that submessages of m 
that are already determined should be the same in n. Condition 2 states that when 
the same item occurs multiple times in m, then all occurrences should have the same 
instantiation in n. By condition 3, if m contains an undetermined item m@z whose 
context is determined, the actor should be able to associate the instantiation of m @z to 
that context. Similarly, condition 4 requires that if m contains more than one item from 
the same context, then the actor should be able to associate their instantiations. Note 
that if m is determined, then determinability is the same as derivability. 

Example 10. Let m = {id,age}\„ be the message to be sent by an actor a in a state in 
which id\u is determined and age\* is undetermined. Then, m is determinable if a can 
derive a message {id\%, *} such that * = age\* and id\% ++ a *. □ 

We now define when message transmissions and traces are valid, i.e., they can be 
executed. For message transmissions of type 1, validity means determinability by the 
sender of the message and of the sender and receiver identifiers. For the other types, the 
initiator of the protocol should determine the sender and receiver addresses, and both 
parties contribute to the construction of the message to be transmitted. A transmission 
results in a new state in which both parties know the protocol transcript. This intuition 
can be extended to traces. 

Definition 11. Let {C f a } a€ ^ be a state at time t . Let a denote the entity corresponding 
to context identifier a (i.e., a ** a), and b the entity corresponding to b (i.e., b +*■ b). 

• The message transmission t is valid in {C' a } a €A if a an d b can determine the 
messages indicated in Table [5] 

• The resulting state of message transmission f = a^b:morf = ai-^b:m from 
{C'Jcza is the state {C' a +l } aeA such thatC^ 1 = C>{a, b, m}, C' b +l = C' b u{a, b, m}. 

• The validity and resulting state of a trace % = t\\ ...;t n are defined iteratively. 

For ZK proofs, the prover needs to know the private information for the proof, and 
both parties contribute randomness. Note that to participate in the protocol, the verifier 
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{Let 1= denote the deductive system without the content analysis rule} 
for all context items m': m' = m, C a t= m' do 

for all context items p, p': m@z = p, m'@z = p', p * p' do 

{Find sequence of evidence for p = p' using breadth-first search} 
Q *- {p} {queue of items to check}; P *- {} {already checked}; found <- false 
while Q * {} a -.found do 

q «- pop(<2); P ^ Pu{q} {move q from queue to already checked} 
if q = p then found <- true; break {evidence for p = p found} end if 
for all context items q': q' occurs in message in C a , q - q, q $ PuQdo 
{Try to find evidence for q = q'} 
for all context items n: C a 1= n, n is minimal w.r.t. q do 

if 3n' : C a 1= n' : (n = n') =» (q = q') then Q *- Qu {q'} end if 
end for 

for all context items n': C a 1= n', n' is minimal w.r.t. q' do 

if 3n : C a 1= n : (n = n') => (q = q') then Q <- Qu {q'} end if 
end for 
end for 
end while 

if -.found then break {No such p' found: try next m'} end if 
end for 

return true{ Actor has evidence that m = m' for a m such that C a 1= m } 
end for 

return false{For all m such that m — m , Ca 1= m : actor has no evidence for m = m'} 

Figure 12: Algorithm implementing the deductive system: given message set C a and 
context message m, check whether C a H m 



does not need to know the public information or the properties to be proven; however, 
he does need to know this information to be able to interpret the proof (i.e., to apply 
the testing rule). For credential issuing, the user needs to know her secret identifier mi, 
randomness, and the issuer's public key; the issuer needs to know his private/public 
key pair, the attributes to be signed, and additional randomness. 

Example 12. Consider actors A = {gov,store} communicating using addresses ip\ gov , 
ip\store- Actor gov knows its own address, and holds a database containing user infor- 
mation: Cg 0V = {ip\' gov } u {id\f^,age\f^,...}. Actor store knows a user identifier and the 
addresses of itself and gov: C® ore = {ip\' store , ip\ gov , id\ user }. An instance n of a simple 
protocol in which gov and store exchange information about the user u (store is the 
client cli, and gov is the server srv) is modelled by the following valid trace: 

ip\cii - felfn- : id \u ; iplsrv - ip\cii ■ {U\l>age\Z\. 

where ip\* u = ip\ store , ip\ z srv = ip\ gov , id\% = id\ d £ = id\ usen and age\ n u = age\ d ^. The result- 
ing state from this trace is {C^} v€ ^. In this state, C\ ore \- age\% and id\ mer age\„. 
□ 

3.4 Prolog Implementation 

We have implemented the framework presented in this section using Prolog. Here 
we describe our implementation^] and its efficiency in general terms; for details, refer 

2 The implementation, along with its documentation, can be downloaded at |http : // www . mobiman . me/ 
publications/downloads/ 



24 



to the documentation of the implementation. 



3.4.1 Deductive system 

Our deductive system is a traditional deductive system ll29l [371 with two extra rules: 
testing and content analysis. Let us first ignore content analysis, and only consider the 
construction, testing and elimination rules. Construction rules generally derive mes- 
sages from submessages; testing and elimination rules derive submessages from mes- 
sages using "additional prerequisites" (e.g., the key for the decryption rule (l-EE)). As 
testing/elimination and construction cancel each other out, there is no point in applying 
testing/elimination to the result of construction rule. Thus, to check the derivability of 
a message m, we try to find a message n in which it occurs as submessage, and try to 
derive m from it using elimination and testing. If this does not work, we repeat the 
procedure for m's submessages: if successful, then m can be obtained from them with 
a construction rule. 

While trying elimination or testing rules, we need to check the derivability of the 
additional prerequisites n. We claim that this check can be done at the contents layer 
(so a simple deductive system suffices). For the testing rule this is clear; however, it 
also holds for elimination rules because their additional prerequisites can always be 
obtained from a content equivalent message using the testing rule. 

Thus, in terms of evaluation, our deductive system differs from standard systems 
in two ways. First, for elimination rules, the additional prerequisites are evaluated 
not using the deductive system itself, but using a (standard) deductive system at the 
contents layer. Second, testing rules are added which are evaluated in the same way as 
elimination rules. Intuitively, our deductive system is thus not much harder to evaluate 
than a corresponding standard deductive system. (However, typically it will be run on 
a larger message set because information has multiple representations.) 

To implement the full deductive system, we use that any deduction in the full de- 
ductive system can be transformed into a deduction deriving the same message that 
satisfies: 

• After content analysis rules, no other rules are applied to a message 

• In any application of (i-C), the message ri2 and the message ni from which it is 
derived only differ by one context item at one position 

• In any application of (hC), the messages mi and rri2 are derived without content 
analysis; also, mi is minimal with respect to ni in the sense that no elimination 
or testing rule can be applied to it to obtain a submessage containing ni ; and/or 
ri2 is minimal with respect to rri2. 



The algorithm in Figure 12 is an imperative translation of our Prolog implementa- 
tion; by the above properties, it implements derivability in our full deductive system. 
Namely, to derive m from a given message set C a , it takes all messages m' = m such that 
C a i— m', and tries to obtain m' from m by content analysis in a context-item-by-context- 
item fashion. For all positions z at which m and m' differ, the algorithm performs a 
breadth-first search for messages obtained from m by content analysis at position z, 
until it finds m with m@z replaced by m'@z. The breadth-first search is performed by 
first searching for a minimal message using testing and elimination rules (lines 10 and 
13); and then searching for a content equivalent message using testing, elimination and 
construction rules (lines 1 1 and 14). We did not optimise this algorithm in terms of 
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complexity. Indeed, in practice, most context items are content equivalent only to few 
other items, so the search space for the algorithm is very limited. 

3.4.2 Associability and Trace Validity 

The algorithm for checking the associability of two contexts is similar to the previ- 
ous algorithm. In particular, it starts with one context (r],k) and uses breadth-first 
search to find associable contexts. This involves finding all identifiers (or entities) that 
occur in (t],k) and all other contexts in which that identifier occurs. The algorithm 
then searches evidence for content equivalence of the different representations of the 
identifier. 

The main task in implementing traces is to check for determinability of a message 
m; that is, to find a derivable message n that is equivalent to m and satisfies properties 
(1) to (3) from Definition|9] Properties (1) and (2) place restrictions on the form of the 
message, which can be expressed in terms of free variables in a Prolog query to the 
deductive system. For property (3) we check associability as above. 



4 Privacy Analysis of IdM Systems 



In this section we apply our method to analyse the IdM systems presented in Sec- 
tion 2.4 First, we model the actors and personal information in the case study presented 



in Section 2.3 describe who knows what initially, and translate the requirements in Ta- 
ble[T]to formal, verifiable properties (Section 4. 1 1. Then, we formalise the IdM systems 
(Section |4~2| ). Finally, we verify whether the IdM systems satisfy the requirements and 
discuss the results (Section 14. 3b. 



4.1 Formalizing the Case Study and Requirements 



Figure 13 shows the actors in the case study and the contexts they occur in, together 



with the initial knowledge held by each of them. The actors are listed in Figure 13(a) 



(In this case study, actors and entities are the same.) The trusted third party Up is 
included because of the anonymity revocation requirement; however, note that it only 
occurs in the Identity Mixer and Smartcard schemes. 



Figure 13(b) lists the domains we use. The • domain contains publicly known iden- 
tifiers for the identity and service providers, and private/public key pairs. The I, K, 
and X domains represent databases of user information held by the respective parties. 
The %, rj, £, and ^ domains represent the communication protocols that are executed 
during the case study. We consider two separate service provisions: this is needed to 
analyse session unlinkability. For simplicity, all communication related to one service 
provision is modeled in a single domain. This expresses that parties involved in service 
provision without communicating directly are able to link their views of the protocol. 
Alternatively, each pair of communication partners could have a separate domain. 



Figure 13(c) shows the profiles representing the actors in the different domains. 
For instance, in the •, 1, K and A domains, Alice represented by the al profile; in the 
n, rj, £, and ^ domains, she is represented by u. By naming these profiles differently, 
we emphasise that actors learn the information not as information about Alice, but as 
information about "the purchaser in transaction x", etc. 



Figures 13(d) and 13(e) define the initial state {C s (l }a^A of the case study at the 
context layer. The information-layer and contents-layer descriptions result as follows. 
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(e) Information about Alice known initially by the different actors : rows represent types of information about 



Alice; the columns labelled al, ii, is, and bs represent the initial knowledge held by the respective actors 



Figure 13: Formalisation of the case study 



When context items about the same entity using the same variable are denoted in the 
normal font (e.g. iu\f t and iff|!jv), they are equivalent; when denoted in boldface (e.g. 
•Plu> ipluX me y are a ll pairwise non-equivalent. Variables of the form i, z'*, k$, k~,ip 
(for any *) denote identifiers; variables d* denote data items; other variables denote 
non-personal information. All representations of a single information item use the 
same variable. Because this case study includes only one data subject, all pieces of 
information have unique contents, i.e., the information and contents layers coincide. 

Figure 13(d) defines the information available about ii, is, and bs. This information 
consists of a private/public key pair for each of the actors, and an identifier for ii, is, 
and bs. The public keys and identifiers are known by everybody; each actor also knows 
his own private key. 

Figure 13(e) lists the personal information known initially about Alice, grouped by 
variable. For instance, d\ represents a city; Alice knows her city as d\ \ l , and ii knows 
it as di\*i. We assume that the actual attribute exchange between user and identity 
provider has already taken place, as shown in the K and ji domains. Knowledge about 
Alice is also present in the %, rj, £ and % domains representing protocols. Knowledge 
of ia\%, ij s \„ held by Alice and the respective identity providers represents the fact that 
Alice has authenticated to them. In the context of the two service provisions, Alice 
knows that she is the data subject (al\,,, al\,,); the service provider knows transaction 
details dj\^, dj\^. Alice knows her IP address ip|*, * e {n,i~i, £,<!;}; note that it is 
assumed to change dynamically between sessions. 

In Table [6] we formalise the requirements from Table [T] with respect to the case 
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Requirement 



Formalisation 



Attribute exchange (AX) 
Anonymity revocation (AR) 


C bs ^{d u d2>60,d 6 }\^{d 1 ,d 2 >60,d 6 }\l 

*\al "bs+ii+is+ltp *\u "bs+ii+is+ttp *\u 


Irrelevant attribute undetectability (SID) 


Cbs & di\$ AC b 


y-d 5 \t 
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C bs ^ ^2 * 






IdP attribute undetectability (ID) 




if d 2 \t AC is 


If rf 3 |*A 




C is \h d 2 >60 


tACa^d 5 


* AC;; If d b \l 


Mutual IdP involvement undetectability (IM) 




*\'idp2 A *\" 


*\al) A 








"is *0 


IdP-SP involvement undetectability (ISM) 




*\sp**\u^ii*\al)* 




*\ bs ++is *\sp^*\u ++U *C) 



Session unlinkability (SL) 

IdP service access undetectability (IL) 

IdP profile unlinkability (IIL) 
IdP/SP unlinkability (ISL) 



*\u ***bs *\u 

*\al*»ii*\i)**\al*»ii*\l* 
*tal <7 " > ' J *l" A *lai <, ^'' s *l" 

*I5 ^ii+is+bs *\u A *l a ; *^ii+is+bs *\u A 
*IS *^ii+is+bs *\u A *l a / *^ii+is+bs *\u 



Table 6: Formalisation of requirements in our case study (C„ if m means ^C a I- m; 
m «/> a n means ->(m n); * means for all possible values) 



study. AX and AR translate to detectability and associability properties of elements 
in our model. For AX, note that bs can always associate the personal information of 
the user to the purchase because of the common context (£,m) or (£,,u), so we do 
not check this. Undetectability requirements are straightforward to formalise; e.g., 
property- attribute undetectability means undetectability by bs of the context item d 2 \ S p 
in any context (8,p). Involvement requirements are defined as not knowing that a 
service or identity provider has a profile in the same domain as the user; for instance, 
for IM, there should be no domain p in which ii can link the idpl profile to \ idp2 and 
the // profile to |^ ; . Linkability requirements translate to contexts not being associable 
by an actor or coalition. 



4.2 Formalizing the IdM systems 



We now present the formalisation of the different systems presented in Section 2.4 In 
each case, the formalisation consists of an initial state and a trace. The initial state 
{C?i}atA extends the initial knowledge described in Figure 13 with respect to the spe- 



cific system. The trace Scenario consists of the messages transmitted during registra- 
tion at ii, registration at is, and two service provisions at bs, respectively. 

We introduce the abbreviation MS|<- (m) := {m,^- (m)} to denote a message along 
with its signature, capturing both X.509 certificates [44] and signed SAML assertions 



4.2.1 Smart Certificates 



Figure [T4]displays our formalisation of smart certificates (Section 2.4. The top part 



Alice knows 



defines the initial state C°. In addition to the knowledge from Figure 13 
her public key certificate MS^-|- (i\ a [,k + \ u ,n c \[) (n c \[ represents additional information 
in the certificate such as the validity date), and the corresponding private key k~\ al . 
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= {nz,b\ n ,n a \ n Y> 
C°=C>{n, b | T ',« i | T '}; 

C^ = Cu{n z , b |. ? ,n z , b |^} 

Scenario := Regjl^; Reg 2 |''; ServProv|^; ServProv|^ 
Regi := 

ip\u^ip\id P i : MS t -| OT (/| M ,fc + |„,n c |.); (1) 
iplu u'->k \ u ', 0;n z , a |.,n z . b |.); (2) 

ip\idpl •* s P|u : MS^-i (ilaj^ilMj^lBj^lKjWfll-) (3) 
Reg 2 := 

iplu y>|«ij>2 : MS Jt -| M (i| B ,* + | B ,n c |.); (4) 
iplu ^Iidp2 : ZK(^|„;fc + |„;0;n z ,a|.,n z . b |.); (5) 

ip\l pZ ^iv\l-^^\i pi {AuMuMuM^) (6) 

ServProv := 
'Plu 

n c \.),MS k -\ {i\u,d\\ u ,d2\ m d-}\ u ,n a \.), (1) 
MS /t-L ('1«,^5l«,4l«,nil-); 
!p|u ^ ;0;n z ,a|.,n z . b |.) (2) 

Figure 14: Formalisation of smart certificates: state {C^}„ € ^4 and (valid) trace Scenario 

The other items of initial knowledge are the contributions n z *|* to Alice's proof of 
knowledge of k~ \ ' ,, and additional information n a \ K , n^V put in the attribute certificates 
issued by ii and is. 



The bottom part of Figure 14 defines the trace Scenario modelling communication 



in smart certificates. The messages in the traces Reg! and Reg 2 correspond to those in 



Figure 2(a) the messages in the trace ServProv correspond to those in Figure 2(b) We 
model the proof that Alice knows the secret key corresponding to her public key as a 
ZK proof with secret information k~ |„ and public information k + \f t . 

4.2.2 Linking Service Model 



Figure 15 shows the formalisation of the linking service model (Section 2.4.2 1. The 
top part defines the initial state C°. This system introduces the linking service Is as 
an additional actor: it has an address and a private/public key pair. Is and is have 
publicly known identifiers i\, , i\ is used in the referrals. The user database of Is, mod- 
elled by domain v, contains an entry for the user containing only the identifier i[\ v al . 
User authentication to Is during registration is modelled by Is's knowledge of 2/|J; the 
pseudonyms generated by the identity providers are modelled as ini s \u and inistfi- Al- 
ice's authentication at ii during service provision is modelled by the fact that ii knows 
the identifiers /,-,|*, * e {£,<!;}. 

The bottom part of Figure [l4|defines the trace Scenario modelling communication 



29 



Cii = C'h u { ip | {s , k + 1 ls ,i | {s ,i | j s , in j s \ n| f,i | \ , i se ss | u , n | £ , !,•; | a , isess | u ; n I ^ } ; 
C," = C, f s U { , £ + 1 ^ , t| ) s , t| i s , i a ,is \uM?Y> 
C-bs = u { ip\ i s , k + 1 ls , (' | ls , (' | js } ; 



Scenario := Reg!^; Reg 2 |''; ServProv|^; ServProv|^ 
Regi := 

ip\id P i -> ip\h : MVi^C'Vi^IkjuIO (0 
Reg 2 := 

y>l;<ip2 : MSjfc-| a (»'a,falK!i>l-) ( 2 ) 
ServProv := 

Hdpl l P\ip '■ MS *i u , ( i ^s\ a ,di\ u ,d 2 \ u ,i\i s ,E k+ ^(i nt i s \ u ,n\.)) (1) 
ip\sp^-ip\is :E k+\, {iii,ls\uM-)MSk-L ,i^e S s\ u ,d l \ u ,d 2 \u,i\i s ,E k+ i (i;i./j|„,n|.)); (2) 

r Us 'idpl 'Is ' 

ip\is -* ip\sp : »W/i2,%| («a,kl»> n 'l-); (3) 
ip\s P ->■ ip\id P 2 : E k*\ u . (»G,/»|i«>n'|.) ) MS Jfc -| , (isess| u ,^i|«,^2|«, J'l^,^! (;'/i,; s |«,n|.)); (4) 



ip\idp2 -* Wisp : MS i-| irf „ 2 (isess| u ,4l«) 



(5) 



Figure 15: Formalisation of linking service model: state {C*} a€ A and (valid) trace 
Scenario 



in the linking service model; the registration and service provision phases correspond 



to Figures 3(a) and 3(b) respectively. To prove authenticity, the identity providers sign 
information for bs using their private key. bs forwards the authentication assertion from 
ii to Is and is to prove that the user has authenticated. The referrals by ii and is include 
random nonces n|., n'|. to ensure that bs cannot link different sessions by comparing 
them. 

The linking service model aims to satisfy a privacy requirement specifically about 
the linking service, which we call LS attribute undetectability (LD). We can express 
this requirement formally in a similar way to the SID, SPD, and ID requirements: 

Ci s y-d 1 \$A...ACi s \td 6 \$. 

The linking service model in general is independent from message formats. How- 
ever, the authors also present an instantiation using the SAML 2.0 ll25ll and Liberty 
ID-WSF 2.0 1 42 1 standards. Our model captures that instantiation. 



4.2.3 Identity Mixer 



The formalisation of the case study using Identity Mixer (Section 2.4.3 1 is shown in 
The top part defines the initial state C°; the most notable aspect of it is 



Figure 



16 



16 



models the communication 



Alice's secret identifier i\ ,. The bottom part of Figure 
in Identity Mixer: registration as in Figure |4(a)| and service provision as in Figure |4(b)1 
For our purposes, we can represent the commitment to Alice's secret identifier in the 
first message by a hash H(i\^,n c i ilf). By inference rule (1-EI3), Alice learns a creden- 
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Cal=Cal u {'\ah n cl,l\- ,n c l,2\- ,n c l,3\- ,n c \.i\- ,n c 2.l\- ,n c 2,2\- > n c2,3|. ,n c 2j|- , 
n v |?,cnrf|f ,n|f ,11x^1? ,n 1;2 |?,n li3 |?,n lia |f ,n 2j i|?,n 2i a|?, 
n v |5 ,cnd|5,n|5 , |5 , ni,2|- , ni,3l- > n i,al- ,n 2j i|? ,n 2ja |? } 

Wi = Wi u V n cl,4|- ,«cl,5l- ,«cl,6l- j> 

c£=C,W«c2 ) 4|.V c 2, 5 |?',«c2,6l-''}; 
Cfo=CfoU{n lib |f,n2 )b |f,n lib |. ? ,n 2ib |.^} 

Scenarios Reg! I 51 ; Regzl 71 ; ServProv| C ; ServProv|^ 
Regi := 
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Reg 2 := 

iplu ip|a P 2 : ^('1k,«c2,i|.); (3) 

iPlu (>|,dp2 : ICred ^| a 2 ( rf 5l«Al«;{»c2,il-}Ll) ( 4 ) 
ServProv := 

lp|n^v| v :«(«1ii,n|.).^(»el».nMl-).^(*|»,ni,i|.),«(d3|ii,ni^|.), (1) 
^1 1 « , ^2 >60| B , cnd| . , k + \ ttp , E k+ ^ ( | „ , n v | . ) cnd \ ; 

ip|u l ^;>|. sp :ZK(cred^ | {iii\ u ,d\\ u 42\ u ,d3\u\n c i2\-,n c i 3 \.),i\ u ,iu\ u ,d\\u,d2\u,d3\u, (2) 

K \ldpl 

n l-. n i^l-! n i,il-) n i,3l-; 

n(i\ u M-)M^u,n 1 2\.)Md2\u,n u \.),n(d3\u,n 1 , 3 \.),di\ u , 
fc + lidpi>^ + l«p,^|„ p (%|«,nv|.) cn d|.; rf 2>60| M ;n la |.,n 1)b |.); 
iplu : ^(('I«,n|.),^(rf5l«. n 2,il-).4l«.cn4; (3) 

ip|u » ip\s P ■ ZK(cred! 1 " {d 5 \ u ,d 6 \ u ;n c2i 2\-,n c 2,5\-),i\u,d 5 \ u ,d 6 \ u ,n\.,n 2! i\.; (4) 
^(('|„,n|.),H(rf5|„,n2 j i|.),4l«^ + l,yp2;0;n2,al-,n 2 , b |.) 

Figure 16: Formalisation of Identity Mixer: state {C*} ae ^ and (valid) trace Scenario 



tial from the issuing protocol linking her attributes to her secret identifier. For instance, 
from message (2) she can derive 

cred^" (i u \ u ,di\ u ,d 2 \ u ,d3\ u ;n c i a \.,n ch5 \.,n clfi \.)\ 71 . 

'id pi 

Note that this credential contains Alice's identifier as an additional attribute: it is 
used later for anonymity revocation. 

In the first message of service provision, again we represent the commitments to 
Alice's secret identifier and attributes by hashes. For anonymity revocation purposes, 
the first message additionally includes an encryption of the identifier i u \* for the trusted 
third party, with a condition cnd\. attached describing when the anonymity of the trans- 
action may be revoked. The ZK proof in message (2) convinces bs that: 
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Table 7: Comparison of privacy requirements claimed and satisfied by the various sys- 
tems. Filled check-mark: satisfied and claimed; empty check-mark: satisfied and not 
claimed; filled cross: not satisfied and claimed; empty cross: not satisfied and not 
claimed (see Table [2} . f: may not be satisfiable efficiently depending on non-privacy- 
related requirements. 



• Alice owns a credential, signed with ii's private key; 

• the secret identifier and attributes in the credential correspond to the values or 
commitments sent previously; 

• the property <^2>60|„ is satisfied; 

• the encrypted message sent previously is encrypted using k + \ ttp and contains the 
identifier in the credential. 

The second ZK proof is similar. Note that the commitment %(/| u ,n|.) in messages (1) 
and (3) is the same, guaranteeing bs that the two certificates are indeed of the same 
user. 



4.2.4 Smartcard Scheme 



The Smartcard scheme (Section 2.4.4 1 is formalised in Figure 17 The top part of the 
figure defines the initial state C u . In this system, the user's personal information is 
exchanged on her behalf by a tamper-resistant smartcard. The smartcard is modelled 
as actor al. The smartcard has a certified private key; however, this private key is 
shared between different smartcards so it does not identify the user. Instead, the smart- 
card has a secret user identifier i\ ,, generated on the card, which is used to generate 
pseudonyms. The actors ii, is, and bs each have a private key and a corresponding 
public key certificate signed by the certification authority. 

Figure [17] (bottom) shows the information flows in the Smartcard scheme: regis- 



tration corresponds to Figure 5(a) service provision to Figure 5(b) Parties derive a 
shared session key using authenticated key agreement based on public key certificates 
and exchanged randomness. The smartcard generates pseudonyms of Alice with re- 
spect to the two identity providers using hashes. In the service provision phase, q|. and 
dm|. represent bs's query: what information it needs, and how recent it should be. 

Note that |72| does not specify the exact format of the encrypted message to the 
trusted third party for anonymity revocation. We chose an encryption of the user's 
identifier at the address provider because this is most appropriate for our scenario. Il72l 
also does not specify how attributes are sent to the smartcard for caching; we chose to 
add one additional message to the registration phase containing all attributes. 



4.3 Results 

Table [7] presents the results of the analysis of the four reviewed systems with respect 
to the requirements in Table [6] The results have been obtained using our Prolog imple- 
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Figure 17: Formalisation of Smartcard scheme: state {C' a } ae ^ and (valid) trace 
Scenario 



mentation (Section 3.4 1: for each system, we verified the validity of the trace Scenario 



from the initial state, and checked which of the requirements hold in the resulting state. 
4.3.1 Non-privacy requirements 

The two non-privacy requirements attribute exchange (AX) and anonymity revocation 
(AR) are satisfied in all systems. Indeed, attribute exchange is the basic requirement of 
an MM system. In smart certificates and the linking service model, ISL does not hold. 
In this case, AR holds automatically because the service provider and identity providers 
can link service accesses to user profiles (even without the help of the trusted third 
party). In the two systems satisfying ISL (the Identity Mixer and Smartcard systems), 
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the transmission of an identifier encrypted for the trusted third party is necessary to 
fulfil this requirement. 

4.3.2 Detectability requirements 

The detectability requirements with respect to the service provider, property-attribute 
undetectability (SPD) and irrelevant attribute undetectability (SID), verify the possi- 
bility to reveal properties of attributes without revealing the exact value; and to reveal 
some but not all attributes. In smart certificates, the complete certificate is transmit- 
ted, so it satisfies neither requirement. To address SID, the identity provider could 
issue a separate credential for each user attribute. To partially address SPD, the iden- 
tity provider could issue several credentials proving common properties of attributes, 
e.g. an "age > 60" credential. These latter credentials could be obtained during the 
service provision phase, in effect transforming smart certificates into a relationship- 
focused system. Indeed, this variant is discussed in l59l . Another possibility is to use 
certificates that allow efficient proofs of knowledge, as in the Identity Mixer system. 

In the linking service model, SPD does not hold. Actually, the linking service 
model focuses primarily on involvement and linkability issues, leaving the details of 
the actual attribute exchange to underlying standards. However, in these standards (in 
particular, SAML) it is not possible to exchange properties of an attribute instead of 
its value. Recently, an extension to SAML to achieve this has been proposed (56). 
Moreover, note that this is a problem in the instantiation of the model with SAML: 
other instantiations in which SPD does hold may be possible. 

IdP attribute undetectability (ID) and LS attribute undetectability (LD) also do not 
hold in the linking service model. This is because the linking service and the subscrip- 
tion provider both receive the signed authentication assertion from the address provider 
as guarantee that the user has logged in. However, in the SAML standard, the attributes 
are part of this signed message, so they also need to be forwarded. Technically, this 
could be easily solved by signing the attributes separately from the authentication infor- 
mation. Again, this problem is due to the instantiation of the model with SAML. Note 
that although ID is not explicitly claimed by the other IdM systems, they do satisfy it. 

4.3.3 Involvement requirements 

The involvement requirements state that an identity provider should not know about the 
user's involvement with other identity providers (mutual IdP involvement undetectabil- 
ity, IM) or service providers (IdP-SP involvement undetectability, ISM). In credential- 
focused systems, this is natural: the identity provider issues a credential to the user 
without involving others, and it is not involved in service provisions. Indeed, smart 
certificates, Identity Mixer and the Smartcard scheme all satisfy IM and ISM. 

In the linking service model, ISM does not hold because there is direct commu- 
nication between the identity providers and the service provider. In a variant of the 
model [26 1, the identity providers and service provider communicate indirectly via 
the linking service. However, here the identity providers encrypt the attributes for the 
service provider (to preserve privacy with respect to the linking service), and so still 
need to know its identity. To prevent this, some kind of trusted intermediary (like the 
smartcard in the Smartcard scheme) seems to be necessary. 

Moreover, the linking service model does not satisfy IM. The subscription provider 
learns from the authentication assertion that the user has an account at the address 
provider (but not the other way round). This problem is also mentioned in [26 |: while 
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"multiple [identity providers] must give [a service provider] the aggregated set of at- 
tributes without knowing about one another's involvement", the authors concede that 
"linked [identity providers] may become aware of just one other [identity provider] - 
the authenticating [identity provider] - during service provision". IM can be satisfied 
(within the standards used) if the subscription provider trusts the linking service to 
verify the address provider's signature. Another possibility to satisfy the requirement 
may be to use group signatures [27 1 for the authentication assertion from the address 
provider. This solution prevents the subscription provider from learning at which iden- 
tity provider the user authenticated, but at the cost of reduced accountability. 

4.3.4 Linkability requirements 

Finally, we discuss the results for the linkability requirements. Session unlinkability 
(SL) is a natural requirement for relationship-focused systems, because the identity 
provider generates a new signature over the attributes at every service provision. In- 
deed, it holds for the linking service model. It also holds for the credential-focused 
Identity Mixer system because rather than showing the credential (which would allow 
linking), the user just proves the validity of properties using ZK proofs. In the Smart- 
card scheme, the smartcard is trusted to correctly send attributes from the credentials it 
knows. In the smart certificates scheme, however, the complete credential is shown so 
the requirement is not satisfied. MP service access unlinkability (IL), in contrast, is nat- 
ural if the identity provider is not involved in service provision, i.e., for the credential- 
focused smart certificates, Identity Mixer, and Smartcard schemes. It is less natural for 
relationship-focused systems such as the linking service model. In this case, private 
information retrieval [28] can be used so that at least the non-authenticating identity 
provider does not learn which user he is providing attributes of. 

To achieve MP profile unlinkability (IIL) global identifiers should be avoided both 
in credential-focused and relationship-focused systems. Smart certificates, being based 
on the user's public key certificate, do not satisfy this requirement. In Identity Mixer, 
IIL holds because identity providers do not leam the identifiers of the credentials they 
issue. In the Smartcard scheme, it holds because because each identity provider learns a 
different identifier based on a secret known only by the smartcard. In the linking service 
model, the authenticating identity provider generates a session identifier and includes 
it in the authentication assertion sent to the other identity provider. This forwarding of 
of the assertion can be avoided if identity providers trust the linking service to verify 
the authentication assertion: identity providers can then issue attributes under different 
session identifiers, and the linking service can assert the link between them. However, 
that this only partially solves the problem: identity providers are still both involved in 
service provision, so they may link using timing information. Indeed, just eliminating 
global identifiers does not fix IIL in our model. 

IdP-SP unlinkability (ISL) does not hold for the same two systems that also do not 
satisfy IIL, and for similar reasons. In smart certificates, all parties learn the user's 
public key certificate; in the linking service model, the service provider learns the 
session identifier from the authenticating identity provider. The other systems satisfy 
it: in Identity Mixer, not even the issuer of the credential can recognise a ZK proof 
about it; in the Smartcard scheme, the smartcard ensures that the information flow 
between identity providers and service providers is restricted to just the attributes. 

However, as a consequence of ISL holding, extra work is needed to achieve ac- 
countability in two respects. First, a message encrypted to a trusted third party is 
provided to the service provider to achieve anonymity revocation. Second, although 
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service providers do not learn a credential identifier, they do need assurance that the 
credential has not been revoked. In the Smartcard scheme, the suggested solution is to 
let the smartcard perform a regular revocation check. Similarly, in the Identity Mixer 
system, credentials can be given a short lifetime and be checked for revocation at re- 
issuing |[l"8l . In both cases, revocation is not immediate. 

For Identity Mixer, two proposals for immediate revocation have been done \22\. 
The first proposal is to include a serial number in the credential. The credential can 
be issued so that either the identity provider learns this serial number or not. The 
former case makes ISL not satisfied. In the latter case, ISL holds but the credential 
cannot be revoked if the user loses her serial number or does not wish to participate. 
Depending on the situation at hand, this latter behaviour may not be acceptable. The 
second proposal is to use a ZK proof that the credential is on a public list of valid 
credentials [18|. This allows revocation without the user's help while not breaking 
ISL; however, the user needs to keep track of all revoked credentials in the system, 
and despite recent advances lfl"8l this may still not be efficient enough. Note that the 
Smartcard scheme does not support immediate revocation at all. 



5 Discussion 

In this section we discuss two aspects of our analysis: first, whether our analysis covers 



all relevant requirements (Section 
analysis to other systems (Section 



5T 
5.2 



; and second, what effort is needed to apply the 



5.1 Privacy Requirements 

This work analyses requirements of IdM systems relating to the knowledge of personal 
information. We have elicited the requirements in Table [T]by analysing both IdM sys- 
tems |8, 26 72 1 and taxonomies of privacy requirements iflOl |4D . We now consider 
whether these requirements cover all relevant aspects of IdM systems. 

The requirements of Table [T] fall into three categories: detectability, involvement, 
and linkability requirements. (AX can be seen as a detectability requirement, and AR 
as a linkability requirement.) This paper presents a model in which these three kinds of 
requirements can be expressed. We first discuss completeness with respect to properties 
expressible in the model. Next, we consider requirements that cannot be expressed 
in the model but may nonetheless be interesting, and discuss how the model may be 
adapted to analyse them. 

Table [8] (at the end of this paper) indicates which formal requirements in Table [6] 
capture which properties in the formal model. The first group of columns indicates 
the coalition with respect to which a requirement is defined; the next groups list the 
detectability, involvement, and linkability aspects that it entails. 

First consider detectability requirements. With respect to bs, all personal informa- 
tion is required to be either detectable by AX, or undetectable by SID and SPD (except 
for dq, which bs can always detect by definition of the case study). Similarly, identity 
providers can detect attributes they endorse by definition of the case study, but no oth- 
ers by ID. (Undetectability of endorsed attributes would be a requirement for the blind 



certification [8| feature of the Identity Mixer scheme as discussed in Section 2.4.3 ) 
There are no detectability requirements with respect to Up, or about the transaction 
details dj. In fact, these aspects would not produce relevant results because ttp never 
learns any attributes, and bs never communicates any transaction details. 
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Involvement requirements do not cover ttp or al: the involvement of ttp is pub- 
licly known, and Alice's involvement is covered by linkability. For identity providers, 
there are involvement requirements about all remaining parties, i.e., the other identity 
provider and the service provider. Usually, service providers assess trustworthiness 
of user attributes by considering which identity provider endorsed them; hence we do 
not regard involvement requirements with respect to the service provider as important. 
(Among the analysed systems, only the Smartcard scheme would satisfy them.) 

Linkability requirements capture associations by coalitions of actors. Clearly, at 
least ii and is are needed to associate K and fi; IIL states that without help of others, 
they cannot. There is no requirement about when bs helps them with this; as it turns 
out, this help never makes a difference. Linkability between user databases and service 
provisions is defined with respect to the respective identity providers, and with respect 
to a coalition of all identity and service providers. Considering other coalitions would 
not reveal interesting differences in the systems we analyse. Similarly, no requirement 
involves ii or is in linking the service provisions to each other; in practice, an identity 
provider would link service provisions to each other by first linking them to its own user 
profile, which is covered by IL. Finally, AR requires linking the service provisions to 
K and not to jj.; this is an arbitrary choice made in the definition of the case study. 

The above discussion shows that our requirements capture the interesting differ- 
ences that our model can express; we now discuss some aspects that it cannot express. 
First consider requirements related to data minimization. Apart from explicitly trans- 
ferred information, i.e., the user's attributes, we analyse one particular kind of implic- 
itly transferred information; namely, involvement requirements. However, other kinds 
may be of interest as well. For instance, the number of transactions performed by a 
user may be privacy-sensitive, as may be the mere date and time of certain activities 
(see, e.g., privacy issues in smart metering systems (62)). Knowledge about numbers 
of transactions can be expressed in our model; date and time may be appended as 
"tags" to communication. Concerning linkability minimization, we consider only non- 
probabilistic linking resulting from common identifiers; however, it would also be in- 
teresting to consider statistical linking [36 1 based on overlapping (but non-identifying) 
information in profiles or the combination of credentials held by a user. Taking these 
aspects into account would require a probabilistic associability relation. Finally, we 
stress that we consider privacy strictly in the sense of minimizing knowledge of per- 
sonal information; in Section|6]we discuss other aspects of privacy. 

5.2 Applicability of the Analysis Method 

Analysing a new system means formally modelling the communication that takes place. 
This involves expressing the communication in terms of messages modelled in the 
three-layer model, and modelling an initial state expressing the knowledge required 
for the communication (personal information, but also information generated during 
execution, for instance, session identifiers and nonces). This latter task is mainly a 
matter of industrious bookkeeping. 

The main difficulty in modelling a system is in making sure the model accurately 
captures the cryptographic primitives that are used. We draw upon our experiences 
in extending the basic formal model of fTD to the present analysis to give the reader a 
flavour of what this entails. Some operations are easily expressible in terms of standard 
primitives. For instance, for our purposes, commitments can be modelled as hashes 
because they satisfy the same inference rules. Other primitives may have different 
variants requiring different inference rules. For instance, signature schemes may be 
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with appendix or with message recovery [53 1: in the first case, the message is needed 
for signature verification, in the second case it is not. Thus, attention is needed to 
ensure that the appropriate model of a primitive is chosen. 

When modelling primitives, it is helpful to look at existing formalisations, e.g. 
using deductive systems [29, 37 1 or equational theories (3] [T2) : they can usually be 
translated to the three-layer model. For instance, the formalisation of labelled encryp- 
tion used in this work is based on 12D . Special attention should be paid to the testing 
rule. Deductive systems do not usually consider testing; equational theories can include 
rules for signature verification (e.g., [32]) which translate to testing rules in the three- 
layer model, but may include only those rules that were relevant to the analysis at hand. 
Thus, to obtain a complete set of testing rules, one needs to take a lower-level look at 
the operation of the primitive. In addition, note that existing formalisations (e.g., ED ) 
may not explicitly model randomness in non-deterministic primitives; however, in our 
model this is needed because we assume messages to be deterministic. 

In some cases, no suitable existing formalisation of a cryptographic primitive may 
be available. In such a case, the general (security) definition of the primitive (e.g., ||30l 
for ZK proofs) generally suffices for obtaining a description for the language C c . How- 
ever, different implementations of a primitive may give rise to different inference rules. 
Thus, to obtain inference rules, one needs to consider the particular implementation 
used in the protocol under analysis. In our experience, this is feasible. Note that be- 
cause we are only interested in privacy aspects of the primitives, usually some simpli- 
fications can be made. See Appendix |A| for two examples: ZK proofs and anonymous 
credentials. 

Our Prolog implementation of the analysis method makes it easy to modify, add, 
and remove inference rules. However, it only supports primitives that satisfy some 
technical assumptions: (1) rules should be either construction, testing or elimination 
rules; (2) testing rules exist for all sub-messages that appear as preconditions for elim- 
ination rules; and (3) there are no infinite cycles consisting of testing and elimination 
rules. These assumptions are true for the primitives considered in this work, and we ex- 
pect that they will also be true for many other primitives used in other MM systems. A 
generalisation of our implementation beyond these assumptions is left as future work. 

The analysis of additional case studies is straightforward in theory, but involves 
some work in practice. Our analysis method and its implementation are designed to 
verify properties of particular elements in a particular trace; both need to be modi- 
fied for other case studies. However, this task can be lightened by exploiting Prolog's 
programming features. For instance, our case study involves two traces of service pro- 
visions, which are almost the same; in our implementation of the model of the systems, 
both are generated by one Prolog predicate which takes the variable elements as input. 
This approach can also be used to generate traces with more service provisions, regis- 
trations, or actors, and to generate lists of checks that need to be performed for a given 
privacy requirement. We expect that this does not raise any serious problems, but leave 
a systematic approach as future work. 



6 Related Work 



We discuss related work on privacy in identity management (Section 6.1 1 and formal 
methods (Section|6.21i. 
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6.1 Privacy in Identity Management 



The principle of protecting privacy by minimizing the knowledge held by actors in 
IdM systems is well-established in the literature. It has been recognised as a basic 
"law of identity" for the design of IdM systems [23]. Hansen et al. [41 J argue that 
privacy-enhancing IdM systems should satisfy a high level of data minimization with 
user-controlled linkage of personal data, and by default unlinkability of different user 
actions. Pfitzmann and Hansen [61 ] define privacy-enhancing identity management as 
preserving unlinkability between user profiles. Finally, in a general survey, Alpar et 
al. [4J identifying three main privacy issues in identity management: linkability across 
domains, identity providers knowing user transactions, and violation of proportionality 
and subsidiarity (i.e., the exchange of minimal information needed to achieve a certain 
goal). These three issues correspond to our three kinds of privacy requirements: linka- 
bility, involvement and detectability, respectively. In contrast to the vision of minimiz- 
ing actor knowledge, Landau and Moore have recently argued that preventing service 
providers from collecting transaction data may not be desirable because it prevents 
the adoption of IdM systems in practice [48]. This falls into a broader discussion on 
incentives of participants in IdM systems (SJED that is out of scope for this work. 

In this work we focus on minimizing knowledge of personal information by tech- 
nical means; other works address other aspects of privacy. Landau et al. [47] argue 
that privacy protection can be achieved not just technically, but also by legal and policy 
means. Hansen et al. [41 1 argue that apart from ensuring data minimization, privacy- 
enhancing IdM systems should also make the user aware of what information is ex- 
changed about her and who can link it; and allow the user to control these aspects. 
Bhargav-Spantzel et al. ifTTIl stress the importance of trust between different parties in 
identity management, and in particular, trust of the user in other parties' handling of her 
personal information. Our method can complement this demand for transparency by 
providing a precise view on how the choice of IdM system impacts privacy. However, 
interestingly, recent research in behavioural economics suggests that offering trans- 
parency to users might actually reduce their privacy by inducing them to release more 
information fl5l . 

Compared to general discussions, much fewer assessments of privacy in particular 
IdM systems have been performed. Several studies include privacy in a more general, 
high-level comparison of IdM systems (TJ |43] - These studies consider a broader con- 
cept of privacy than we do, also considering transparency and control aspects. How- 
ever, the analyses have been performed by hand in an informal and high-level (and thus, 
subjective) way. Concerning knowledge of personal information, |1| considers three 
different criteria: "usage of pseudonyms/anonymity"; "usage of different pseudonyms" 
and "user [is] only asked for needed data" (judged on a yes/no scale). Il43l considers 
two: "directed identity'V'pseudonymous/anonymous use" and "minimal disclosure" 
(judged on a ++ to — scale). Conversely, we offer a much more granular comparison 
that is performed using formal methods, and is thus automated, verifiable, and as a 
consequence, more objective. 

Some formal works on privacy in identity management are available. In [61 1, 
privacy-enhancing identity management is defined as preserving unlinkability between 
different user profiles, and the meaning of linkability and its relationship with related 
concepts is explored in a semi-formal way. Their informal definitions formed the basis 
of our original work [ 70 1 on representing knowledge of personal information. Other 
formal work on identity management has mainly focused on safety properties with re- 
spect to misbehaving attackers, rather than privacy properties with respect to insiders 
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who follow the protocol specification. In this context, unlinkability [50, 69 1 and unde- 
tectability ETI properties have been considered for Identity Mixer and related anony- 
mous credential schemes. For SAML 1251 . a standard for the exchange of identity 
information between identity and service providers used in the linking service model, 
secrecy properties have been considered [6|. Our work differs from this latter category 
in two respects: first, we define properties in a general setting, allowing comparisons 
between different systems; and second, we distinguish between the roles of different 
insiders rather than considering one outsider, enabling us to express which (coalitions 
of) actors can associate or detect certain information, and which cannot. 

6.2 Formal Methods 

Formal methods have been around for many years as an important tool to analyse se- 
curity of communication in IT systems J3] [TTl |52j l60l . Most formal methods rely 
on two basic ideas: the Dolev-Yao attacker model and state exploration techniques. 
In the Dolev-Yao attacker model, one considers communication messages using ide- 
alised cryptographic primitives, and an attacker who controls some or all communi- 
cation channels between legitimate parties (meaning that he can insert and suppress 
messages at will, and fabricate messages based on his observations). The reasoning 
that the attacker performs to fabricate messages can be described by deductive systems 
(e.g., |29, 37|) or equational theories (e.g., ||3][T2))- State space exploration techniques 
assess the system security by analysing all possible evolutions of a given system in the 
presence of a Dolev-Yao attacker. The requirements of a system are then verified by 
checking whether any of the states that can be reached by the system correspond to an 
attack (e.g., the attacker knows a secret, or has succeeded in impersonating a legitimate 
user). Several process algebras |f3] [14] [54] provide machinery to perform state space 
exploration. Other approaches have also been proposed, e.g., using induction [60|. 

Recently, more and more work has focused on using these techniques for privacy 
properties, in application domains such as electronic toll collection [31 1, e-voting l32l 
[331 . RFID systems fl6l . and Direct Anonymous Attestation [66 1. These proposals 
express privacy in terms of "experiments": slightly different settings for the execution 
of the same protocol that should be indistinguishable to an attacker. For instance, in 
electronic toll collection, an attacker should not be able to distinguish a setting in which 
a first car takes a left road and a second car takes a right road from a situation in which 
the first car takes the right road and the second car takes the left road. Similarly, in 
Direct Anonymous Attestation, an attacker should not be able to distinguish a signature 
produced by one trusted platform module from a signature produced by another one. 

Compared to these existing formal methods, our work differs in two crucial ways. 
First, instead of looking for attack evolutions for each property separately using state 
space exploration, we consider one single evolution without misbehaving actors, and 
define our properties in terms of that. Second, to interpret messages as personal infor- 
mation, we reason about them using our three-layer model and own deductive system 
instead of standard equational theories or deductive systems. We argued that our deduc- 
tive system is as expressive as standard ones; it would also be interesting to investigate 
the formal relation between our properties verified with our method, and analogous 
experiment-based properties verified with state space exploration. 

We now discuss existing work modelling the primitives covered by our deductive 
system. Labelled encryption is a straightforward extension of normal encryption; our 
model is similar to the one in lETI . The internals of (incorrect) protocols for authen- 
ticated key agreement have over the years proven a popular target for analysis using 
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formal methods ifTTl I5T1 l60l : however, we have not found prior works that formally 
model the external behaviour of (correct) authenticated key exchange protocols in a 
larger system. 

For ZK proofs, both high-level and low-level formalisations exist. In BUI , a low- 
level model of the operation of ZK proofs is given; however, it cannot be used for 
knowledge derivation; also, questions have been raised about its technical correctness 
GTl . Two high-level formalisations of ZK proofs have been proposed 171 ETTl that, 
as ours, allow proofs of a restricted set of properties. The equational theory in [7| 
models the verification of ZK proofs (as our testing rules); the model of ETI only 
allows correct ZK proofs to take place and does not express their verification. The 
latter simplification is not suitable for our method, because verification expresses that 
an actor learns information in new contexts. Note that both model "signature proofs of 
knowledge" rather than E-proofs; however, our methods can also capture that variant. 

Three recent proposals [21 50, 66 1 are relevant for our formal model of anony- 
mous credentials. iBOl only considers operational aspects of anonymous credentials. 
[21 1 models credentials and their showing protocol. The model of credentials is similar 
to ours, and it includes a rule to obtain a credential from a committed message as in our 
low-level formalisation (Appendix A. 2 1. The showing protocol is formalised in terms 
of ZK proofs. However, credential issuing is not considered in [21]. Finally, Smyth 
et al. 11661 model joining and signing protocols for ECC -based Direct Anonymous At- 
testation, which are very similar to issuing and showing protocols for BM-CL-based 
anonymous credentials ll20l . Although our model is based on a different signature 
scheme |19| and specified at a higher level, their model of signatures generally corre- 



sponds to our model of signatures from committed messages in Appendix A. 2 



7 Conclusion & Future Work 

The contributions presented in this work are threefold. First, we have captured the 
concept of privacy by data minimization in a detailed and comprehensive set of re- 
quirements for IdM systems. Second, we have developed a formal analysis method to 
verify such requirements. Finally, we have formally analysed and discussed the privacy 
of four representative IdM systems from the literature. 

In Table [T] we have presented a list of eleven detailed requirements for IdM sys- 
tems which address privacy by data minimization. The requirements have been elicited 
by analysis of both existing taxonomies and existing IdM systems; to the best of our 
knowledge, we are the first to provide such a comprehensive and detailed taxonomy 
of requirements. Two requirements capture information that should be learned: the 
attributes that a service provider needs, and the link between user and service access in 
case of anonymity revocation. The other requirements capture information that should 
not be learned, divided into three categories: detectability (which actors know which 
attributes), involvement (who may know who else is involved in the identity exchange), 
and linkability (which coalitions can link information from different contexts). We 
have shown that the relevant requirements for these categories have been captured in 
the model. An interesting extension in this direction is to systematically consider other 
kinds of privacy aspects, such as knowledge of the number or timings of transactions, 
and uncertain knowledge of associations based on partially overlapping profiles using 
probability. 

We have presented a three-layer model and deductive system to effectively express 
how actors learn personal information from communication. The model shows what 
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information an actor knows, and what information he knows to be about the same per- 
son. The main feature of this model is that it captures not only the information itself, 
but also its contents and the context in which it is used. Our earlier works in this di- 
rection 1701I7TI considered communication using basic cryptographic primitives only; 
in this work we have also considered reasoning about attribute properties, additional 
cryptographic primitives, and cryptographic protocols. In particular, our work presents 
the first model to reason about knowledge arising from ZK proofs and anonymous cre- 
dential issuing protocols. We needed to extend our previous work to model the systems 
analysed in this work; similar extensions may be necessary to analyse other systems. 
We have provided some advice based on our experiences. 

We have used this model to verify the elicited requirements for four representa- 
tive IdM systems: smart certificates, the linking service model, Identity Mixer, and a 
system based on smartcards. We analysed these systems against the eleven identified 
requirements (Table [7]). It is worth noting that only 17 of these 44 requirements are 
mentioned as (parts of) requirements in the design of the respective IdM systems. In 
one instance, we found such a requirement not to hold (a problem which is also men- 
tioned by the authors of the system themselves). In another instance, we clarified the 
exact setting in which a requirement holds, which may be unrealistic for performance 
or accountability reasons. The remaining 27 of the 44 entries of the table do not cor- 
respond to requirements explicitly stated by the designers of the IdM systems. In this 
work, we have established whether they hold or not, leading to a more comprehensive 
analysis and comparison of IdM systems. 

As future work, the scope of the analysis can be extended along several direc- 
tions. First, besides the privacy aspects already mentioned, the method can be ex- 
tended to analyse assurance and provability aspects. Second, additional IdM systems 
like U-Prove |58 1 and the STORK Platform (https : / /www . eid -stork.eu/j l as well 
as other variants of the systems we considered can be analysed and compared. Finally, 
to evaluate the performance of the method, we plan to apply it to a real case scenario. 

We conclude by remarking that the privacy aspects in Table [7] make up only one of 
the several factors that need to be considered when choosing the "best" IdM system. 
In fact, the true design challenge for privacy-enhancing IdM systems lies in achieving 
privacy while at the same time guaranteeing other security aspects such as confiden- 
tiality, integrity, availability, accountability and assurance. However, we hope that the 
precise and detailed privacy assessment provided by our method contributes to making 
the privacy factor an important one in the design and selection of IdM systems. 
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A Inference Rules for Zero-Knowledge Proofs and Cre- 
dential Issuing 

In this appendix we show how our models of ZK proofs and the credential issuing 
protocol are derived. 

A.l Zero-Knowledge Proofs 

ZK proofs allow a prover to prove to a verifier that he knows some secret information 
satisfying certain properties with respect to some public information, without revealing 
any information about the secret. For instance, consider a large group of prime order n 
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Figure 18: Schnorr proof of knowledge and its formal model 



generated by a group element g. Note that given value h, it is infeasible to determine 
the discrete logarithm x = log g h; this property can be exploited to build a public key 
cryptosystem in which values of h are public keys, and the corresponding values of x 
are private keys. A prover who knows x as well as n, g, and h can engage in a ZK proof 
protocol with a verifier who just knows n, g, and h; when the protocol has finished 
successfully, the verifier is convinced that the prover knows the value of x, without 
learning anything about its value. 

The general definition of ZK proofs leaves open different kinds of implementations; 
we model a particular kind of ZK proof called E-protocols l30l . E-protocols are three- 
move protocols in which the prover first sends a commitment; the verifier responds 
with a randomly generated challenge; and finally the prover sends a response. The ZK 
proofs used in the systems analysed [8 , 19, 20 38 1 are of this kind. 

An example E-protocol is the classical protocol due to Schnorr to prove knowledge 
of x = \og g h in the setting given above (Figure 18(a) I. The prover computes a random 
u and sends a commitment g" to the verifier. The verifier responds with a random 
challenge c. The prover calculates response r = u + cx. The verifier convinces himself 
that the prover indeed knows the secret x by checking that g'~ = ah c using the response, 
commitment and public information. The prover can only calculate a valid response if 
he knows the secret; also, the response does not reveal any information about x [63 1. 

We formally model ZK proofs at a high level using the primitive ZK(m i ; rri2; m3; n). 
The secret information mi and public information m2 are described in terms of mes- 
sages; the ZK proof proves that the public information has a certain message structure 
with respect to the secret information. In addition, the proof can show that context 
data items d occurring in mi satisfy properties l//&(d), listed in rri3. Finally, n repre- 
sents randomness; in E-protocols, n = {n p ,n v }, representing the provers' randomness 
n p for the commitment and the verifier's randomness n v for the challenge. For instance, 
ZK(k~; k + ;0; {n p , n,,}) is a proof of knowledge of the private key k~ corresponding to 
public key k + with no properties and contributed randomness n p ,n v . From this high- 
level description in terms of structure of messages, the low-level description follows 
implicitly. For instance, in a setting where public/private key pairs are of the form 
(h,x = log ? /i), the proof ZK(k~; k + ; 0;{n p , n,,}) corresponds to a proof of knowledge 
of the discrete logarithm x = log g h of h like the Schnorr protocol. Figure 
Schnorr protocol and its formal model in this setting. 
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C a \- ZK(m 1 ;m 2 ;m3;{n p ,n,.}) C u H ZK(m 1 ;m2;m3;{np,n v }) 
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C a i- m 3 C fl i- n r 
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ZK(mi;m2;m3;{np,n v }) ?=> rri2 (1-EZ3') 
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(1-EZ4') ZK(mi;m 2 ;m 3 ;{np,n v }) ?=> rip 
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Figure 19: Complete set of inference and testing rules for ZK (C a a set of context 
messages; m*, n* context messages; p,- properties of rrij, i.e., every p,- = y/y(mjfc) 6 D e 
for some 7", fe) ) 



In Figure 19 we present a set of inference rules for the ZK primitive. We first 
explain them, and then argue that for privacy purposes and under certain assumptions, 
it suffices to consider the smaller set of rules presented in Figure [9] We first discuss 
what messages can be derived from a ZK transcript ZK(mi; m 2 ; ITI3; {r\ p , n,,}) using 
elimination and testing rules. In general, secrecy of the properties m 3 to be proven is 
not a privacy goal in the ZK literature; thus, the particular property proven by a ZK 
proof will influence the format of the messages transmitted. As an over-estimate, we 
allow any actor to derive the properties ITI3 from the transcript (i-EZi')- The verifier 
randomness n v is transmitted as challenge, and so can be derived from the transcript 
(hEZ 2 ')- In particular, any observer who knows the public information m 2 can also 
verify the ZK proof; hence ZK(mi; m 2 ; m 3 ; {n p , n v }) ?=> m 2 . Because both parties are 
already assumed to know m 2 before the start of a ZK proof, it does not need to follow 
from the transcript. In fact, it is usually not possible to derive m 2 , as can be seen in the 
Schnorr example|j 

The fact that the protocol is zero-knowledge means that a verifier (who knows m 2 , 
m 3 and n v ) should not be able to learn anything about mi. In fact, if there are sev- 
eral possible secrets mi corresponding to public information m 2 , then the probability 
distribution for protocol transcripts is required to be independent from mi. Thus, it is 
impossible to test mi from the transcript. (Of course, if m 2 determines mi, e.g., if they 
are a public/private key pair, then mi can be derived using m 2 , but this is not due to 
the ZK proof.) Because the verifier, who knows all components of the ZK proof except 
mi and n p , cannot deduce anything about the secret mi, any inference rule to derive it 
needs to have n p as a prerequisite. By a similar line of reasoning, if mi can be derived 
from rip, then an inference rule for n p needs mi, or it needs to be a testing rule. In 
fact, as can be seen from the example of the Schnorr proof, in E-protocols all these 
inferences can be made: mi can be derived directly from n p (1-EZ3') and vice versa 
(1-EZ4'), and rip can be tested. 

To generate a transcript ZK(mi;m 2 ;m3;{np,n v }) of a E-protocol, an actor needs 
r\ p for the commitment; n v for the challenge; and both pieces of randomness and 
the private information for the response n p (i-CZ'). (Technically, the public infor- 
mation is not needed.) Similarly, for determinability of the message transmission 
a b : ZK(mi;m2',m3; {n p ,n,,}), the prover needs {mi,n„} in addition to the com- 



3 However, note that if we append h to the first message of the Schnorr proof, it is still a valid E protocol, 
but now one from which the public information can be derived. 
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C fl Kk + CaKmj C„hn„ C^i-s£-(mi,n«) C« h {k^rr^n*} 

q ( kCS ) (kCS ') 

Co t— S k -(mi,n fl ) C a t- S k -(mi,m 2 ,n fl ,n fe ) 



Figure 20: Inference rules for signature scheme with signatures on committed values 
(C a a set of context messages; k + /k~ a public/private key pair; m*., n* context mes- 
sages) 



munication addresses {a, b}; the verifier needs n v . 

There are two aspects the above model does not take into account. First, from two 
ZK proofs using the same prover randomness, the secret can be derived: in case of the 
Schnorr proof, by computing (r - r')/(c - c') from transcripts (a,c,r) and (a,c',r r ). 
This is a general property of E-protocols called special soundness. However, if the 
prover always honestly generates his randomness, then this is very unlikely and we 
can safely ignore it. Second, an actor can also "simulate" a ZK proof transcript without 
knowing the secret information by first generating the challenge and response and from 
that determining the commitment. Such a simulation has the exact same form as a ZK 
proof, but because the randomness in the commitment is unknown, it cannot be used 
to derive a secret corresponding to the public information. Such simulations are very 
unlikely to correspond to ZK proofs that really took place, so they are not relevant for 
knowledge analysis. 

To express privacy requirements, the knowledge of randomness is not directly rel- 
evant. In addition, assuming that the randomness of the ZK proof is freshly generated 
and not re-used elsewhere, it is clear that it cannot help to derive information indirectly: 
(1-EZ3 ') is the only rule to derive personal information (namely, m 1 ) using randomness, 
and it has knowledge of r\ p as prerequisite, which can only be derived when m 1 is al- 
ready known. Ignoring rules (1-EZ2'), (1-EZ4'), we obtain the inference rules given in 
Figure [9] testability relations in Table [4] and determinability requirements in Table [5] 



A.2 Anonymous Credentials and Issuing 

In an anonymous credential system, credentials credj^ 1 (M2',Mt,) assert the link between 
a user's identifier M\ and her attributes M? using secret key k~, and such credentials 
are issued and shown anonymously [19|. Anonymous issuing means the issuer of the 
credential does not learn the user's identifier Mi (in particular, this means he cannot 
issue credentials containing the identifier without the user's involvement). We model 
the issuing protocol by the ICredj^ 1 (M2',M' 3 ) primitive. The randomness M' 3 used in the 
issuing protocol determines the randomness M3 in the credential. Anonymous showing 
means that it is possible to perform ZK proofs of ownership of a credential proving 
certain properties. This is captured by our ZK primitive. 

We model anonymous credential systems constructed from signature schemes |[T9l 
l20l as used in the Identity Mixer system |8). In general, this construction is possible 
if the signature scheme allows for issuing of signatures on committed values (Fig- 
ure 20i. That is, a commitment 5^_(mi,n a ) to message mi using randomness n a is 
constructed using public key k + (1-CS ); this commitment is turned into signature 
5k- (mi , rri2, n fl , n^) using private key k~, message m2 and randomness n^, (i-CS ')- 
Based on such a scheme, an anonymous credential credj^rr^; {n a ,n^,}) is simply a 
randomised signature (containing secret identifier mi and attributes IT12) along with its 
used randomness. In the Identity Mixer system, two such signature schemes can be 
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a^b : S k -(mi,n 2 ); 

an-b : ZK(mi,ni,n2;k + ,H(mi,n 1 ),5k-(mi,n 2 );0;{n3,n 4 }); 

b^a : {5 k -(mi : m2,n 2 ,n 5 ),n 5 }; 

bi^a : ZK(k _ ;k + ,5 ( k-(mi,n2),m2,n5,5 k -(mi,m2,n2,n 5 );0;n 6 ,n7) 

Credential obtained: {5 k - ( m i , rri2 , ri2 , n5 ) , ri2 , nj } 
(a) Issuing protocol for anonymous credentials 

a m- b : ICred™ 1 (m 2 ; {n,}J =1 ) 

Credential obtained: cred™J (m2;{ri2,n5}) 
(b) Formal model of anonymous credential issuing protocol 

Figure 21: Anonymous credentials from signature scheme with signatures on commit- 
ted values 



used: SRSA-CL signatures [ 19] and BM-CL signatures [ 20 1 . There are slight technical 
differences between the two; we discuss SRSA-CL signatures and briefly outline the 
differences later. 

The anonymous credential issuing protocol can be modelled as a trace in terms of 
the signature scheme (Figure |21(a)| i. It involves a user a and an issuer b. As before, a is 
assumed to sent a commitment T~L ( m i , n i ) to her secret identifier to b prior to initiating 
the protocol. (Unlike the commitment iSr_(mi,ri2) for the signature, m i , n i ) does 
not depend on k~ and can thus be shared with other issuing or showing protocols for 
credentials having a different key.) In the first two messages, actor a provides her 
commitment for the signature, and then proves that it is formed correctly; that is, it 
indeed contains the identifier corresponding to the one in "H(mi,ni). Actor b uses 
the commitment to construct a signature on {mi,m 2 ,ri2,n5}, and sends the signature 
along with his randomness to a. At this point, a knows the signature and the two 
pieces of randomness used in it: these three components together form the anonymous 
credential, as shown in the Figure. (Note that b does not know ri2, so he does not have 
the complete credential.) In the last step, the signer b proves that 5V ( m ij m 2, 12,15) 
is valid; when using the SRSA-CL signature scheme, this step is technically needed to 
ensure the security of the signature [8|. Figure 21(b) displays our high-level model of 
the issuing protocol and the credential obtained from it. 

The high-level inference rules (Figure 10 1, testability relation (Table [4]i and deter- 
minability relation (Table |5]l for cred and ICred follow from the lower-level model in 
Figure 21(a) The credential's signature can be verified using {k + , mi , rri2 }, and a cre- 
dential can be constructed from its components (i-CR). Although randomness can be 
inferred from the credential, we do not model these inferences in the high-level model 
because they are not relevant for knowledge of personal information. 

From the issuing protocol, the user can infer the credential using the randomness 
from the credential (1-EI3). We check the messages of the trace for further possible in- 
ferences. For the two ZK proofs, (1-EZ1) does not apply because there are no proofs of 
properties. The (1-EZ2) rule can be applied to both ZK proofs occurring in the issuing 
protocol; this translates to rules (1-EI1) and (1-EI2). We also consider the derivation of 
the nonces n 1, n2 (1-EI2): ni is generated outside of the issuing protocol, so its deriva- 
tion may be of interest; ri2 is a prerequisite for (1-EZ3). We do not add a rule to derive 

(mi . ri2 ) from the transcript because its knowledge is not interesting from a privacy 
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point of view. However, knowledge of this message does mean that {mi,k + ,n2} can 
be tested. The two testing rules for the ZK primitive each give two additional testing 
rules for ICred. The third message transmission in the trace allows one to link issued 
credentials to their issuing protocols. 

Finally, consider ICred™J (ni2; {n,-}J =1 )'s determinability requirements. Assuming 
fresh nonces, determinability of {a,b,k + ,mi,ri2} by a is required for the first mes- 
sage transmission. For the first ZK proof, a additionally requires nj and 113; b requires 
04. The next message means determinability of {k~,m2,n5} by b. The last ZK proof 
additionally means determinability of {k + ,ri6} by b; a requires rij. We get the deter- 
minability requirements given in Table [5] Note that technically, a does not need rri2 to 
run the protocol, and b does not need H( mi , n 1 ) ; however, in practice, they will check 
whether the data supplied matches their expectations using the checks expressed by the 
testing rules. 

We mention two modelling details regarding the use of SRSA-CL signatures for 
anonymous credentials. First, the last ZK proof in the issuing trace is technically not 
a proof of knowledge of the private key, but of the RSA inverse of part of the issuer's 
randomness. However, in terms of knowledge this proof is equivalent because the 
private key can be determined from the RSA inverse and vice versa iTOl . Second, 
due to the structure of the signature, different choices for n a and can lead to content 
equivalent signatures. However, assuming n„ and are chosen at random, this happens 
with negligible probability. 

Finally, an alternative signature scheme supporting signatures on committed values 
is the BM-CL scheme EOl . There are two technical differences with the SRSA-CL- 
based system presented above. First, BM-CL signatures have the additional property 
that they allow "blinding": a user can turn a valid credential cred™_' (m2;{n fll nj}) into 
a different credential cred™J (012; {n^, n^,}) (however, she is not able to change random- 
ness n;,). Second, the final ZK proof in the issuing protocol of Figure 21 is not nec- 



essary for a BM-CL-based scheme. We chose the SRSA-CL-based signature scheme 
because the high-level model is simpler; however, in terms of privacy the choice of 
signature scheme does not matter. 
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Table 8: Schematic overview of the requirements in Table [6] Each row indicates 
that with respect to the given coalition of actors, (a) the given items should be 
(un)detectable; (b) the involvement of the given actors should be unknown; and (c) 
Alice's profiles in the given domains should be (un)associable 
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